Obtaining an Improved Therapeutic Ligand

ABSTRACT

Methods and associated apparatus involving designing a ligand ab initio that will bind to a binding site of a macromolecular target, or of identifying a modification to a ligand for improving the affinity of the ligand to a binding site of a macromolecular target, comprising using information about non-bonding, intra-molecular or inter-molecular atom to atom contacts extracted from a database of biological macromolecules to identify favoured regions adjacent to the binding site for particular atom types and modifying a candidate ligand to increase the intersection between atoms of the candidate ligand and the favoured regions. One or more steps of the methods may be performed by a computer.

The present invention relates to obtaining an improved therapeuticligand, in particular by determining how an existing or candidate ligandcan be modified to improve binding of the ligand at a binding site on atarget protein or by aiding the de novo design of a candidate ligand asa precursor to a therapeutic.

Therapeutic molecules (ligands) fall into two distinct classes: chemicalentities (or novel chemical entities, NCEs) and biologicals. The formerare low molecular weight organic compounds, typically of molecularweight of 500 Daltons or less, that have been chemically synthesized orisolated from natural products. These are typically derived fromstarting chemicals or ‘hits’ that are discovered by screening chemicalor natural product libraries. Such hits typically have sub-optimalbinding affinity for the target and considerable trial and error inchemical modification is required in order to obtain better affinity forthe target (typically of affinity constant (K_(D)) low micromolar orless). It is preferable that the hit has a lower molecular weight, say300 Daltons or less, so that subsequent chemical modification does notexceed the 500 Dalton limit. These hits are often referred to as‘fragments’. Optimisation of the hit to obtain a candidate therapeuticor lead molecule is greatly enhanced by structural information; forinstance by obtaining an x-ray crystallographic structure of the proteinin co-complex with the hit molecule or fragment. Such data providesinsight into where on the target protein the small molecule binds andimportantly indicates how atomic interactions between the two accountfor binding. Furthermore the topographical nature of the protein surfaceimmediately surrounding the bound hit is revealed; and particularly ifit is a cleft or a pocket, the structure will suggest how the hit mightbe elaborated to better fill the space within the pocket and how to makefurther interactions with the protein and hence improve binding affinityand specificity.

There are a number of computer based algorithms available to assist themedicinal chemist in making rational choices for chemical elaboration ofthe hit. These are either physics based methods that attempt tocalculate the free energy of binding between the small molecule andprotein from first principles (e.g. Schrodinger® suite of software) orare statistical potential methods that rely on a database of atomicinteractions extracted from collections of protein—small moleculestructures (e.g. SuperStar).

Biologicals are large peptide or protein molecules (of molecular weightgreater than 1000 Daltons). They are often antibodies or antibody likemolecules that recognize and bind to a target molecule, usually withbetter affinity and specificity compared to NCEs (K_(D) low nanomolar orless). They may also be other types of protein molecules such ashormones, cytokines, growth factors or soluble receptors.

The binding of a candidate biological therapeutic molecule to a bindingsite can be modified by mutating the candidate therapeutic molecule.This may be required to improve the binding affinity or alter thebinding specificity. However, it is relatively time-consuming to performthe mutation and to test the binding efficiency of the mutated molecule.Many different mutations may be required before improvements in bindingefficiency are obtained.

It is known to use computers to predict what kind of modifications mightbe most effective. However, a given molecule can be modified in a vastnumber of ways and it is difficult to configure a computer so that theprediction can be achieved reliably in a practical period of time.

Laskowski R A, Thornton J M, Humblet C & Singh J (1996) “X-SITE: use ofempirically derived atomic packing preferences to identify favourableinteraction regions in the binding sites of proteins”, Journal ofMolecular Biology, 259, 175-201 discloses a computer-based method foridentifying favourable interaction regions for different atom types atthe surface of a protein, such as at a dimer interface or at amolecular-recognition or binding site. The Laskowski et al predictionsare based on a database of empirical data about non-bondingintra-molecular contacts observed in high-resolution protein structures.

In the approach of Laskowski et al, the 20 amino acids are broken up toyield a total of 488 possible 3-atom fragments. Taking chemicalsimilarities into account these are reduced to a set of 163 fragmenttypes that sub-divide the database. Each fragment contains a first atom(referred to as “position 3”) with two further atoms definingtriangulation (or spatial normalization) positions. A density functionis derived by recording the various positions at which an atom (whichmay be referred to as a “second atom”) is found to be in a non-bondingintra-molecular contact with the first atom of a 3-atom fragment.

A predicted favourable interaction region for a given atom type isobtained in Laskowski et al by transplanting density functions into thebinding site. Each density function is transplanted such that thecoordinates of the three atoms of the 3-atom fragment corresponding tothe density function are superimposed on the coordinates of acorresponding 3-atom fragment in the binding site. Where densityfunctions from different 3-atom fragments in the binding site overlap,an average “density” is used to predict the favourable interactionregion.

The approach of Laskowski et al is relatively complex and discardspotentially useful data. The density functions of Laskowski et al areobtained by populating a 3-D grid with the positions of second atomcontacts for each fragment type. A different grid is used for each ofthe 163 fragment types. Data for each second atom type is thenmathematically transformed to give the density function. Using thefragment definitions of Laskowski et al, fragment type can be shared bydifferent atoms on the same residue and by atoms on different residues.When the Laskowski et al database has been built on these fragment typesthere is an over-abundance of main chain fragments that requires adown-weighting at the stage of transplanting the density functions intothe binding site. Furthermore, a given fragment type can include severalactual fragments with subtle differences in bond lengths and angles andconcomitant differences in second atom distributions which become maskedwhen combined in these divisions.

Short range secondary structure in the proteins used for deriving theempirical data in Laskowski et al can lead to bias and reduces theefficiency with which the empirical data indicates favourableinteraction regions.

It is an object of the invention to address at least one of the problemswith the art discussed above.

According to an aspect of the invention, there is provided a method fordesigning a ligand ab initio that will bind to a binding site of amacromolecular target, or of identifying a modification to a ligand forimproving the affinity of the ligand to a binding site of amacromolecular target, comprising:

a) identifying a target list of atoms forming the surface of the targetbinding site;b) identifying each atom, hereinafter referred to as a theta atom, inthe target list, as a particular theta atom type;c) extracting from a structural database of biological macromolecules,information about non-bonding, intra-molecular or inter-molecular atomto atom contacts, where the first atom in a contacting pair of atoms isof a particular theta atom type and the opposing, second atom of thepair, hereinafter referred to as an iota atom, is of a particular iotaatom type, said information comprising spatial and/or contextual dataabout the iota atom relative to the theta atom, and said data collectedfor a plurality of contacts of the given theta atom type from the saiddatabase is hereinafter referred to as a theta contact set;d) for each theta atom identified in the target list in step b),superimposing in or around the target binding site data relating to agiven iota atom type, or a predetermined group of related iota atomtypes, from the corresponding theta contact set extracted in step c);e) combining and/or parsing the superimposed data in such a way as topredict one or more favoured regions of the binding site where the giveniota atom type, or the predetermined group of related iota atom types,has high theoretical propensity; andf) with a candidate ligand notionally docked into the binding site,comparing the type and position of one or more of the atoms of thecandidate ligand with the predicted favoured regions for the respectiveiota atom types, to identify a modification to the candidate ligand, interms of alternate and/or additional candidate ligand atoms, that willproduce a greater intersection between the alternate and/or additionalcandidate ligand atoms and the respective iota atom type favouredregions, leading to an improvement in the affinity of the modifiedcandidate ligand to the binding site compared to the unmodifiedcandidate ligand;

wherein each non-bonding intra-molecular or inter-molecular contact inthe database is defined as a contact between opposing residues of aprotein fold or between opposing monomer units of a macromolecular foldor between two interacting macromolecular partners and is specificallybetween a theta atom on one side of the fold or first interactingpartner and an iota atom on the opposing side or second interactingpartner; in an instance where the following condition is satisfied:

s−Rw≦t, where s is the separation between the two atoms of the contact,Rw is the sum of the van de Waals radii of the two atoms of the contact,and t is a predetermined threshold distance; and

wherein the theta atom type is identified uniquely in step b) such thatthere is no intersection between the data of a theta contact setextracted in step c) for a given theta atom type and the data of anyother theta contact set extracted in step c) for any other theta atomtype, apart from data concerning contacts involving the given theta atomas the iota atom.

Thus, each target atom type in the binding site is classified uniquelyand is associated with information about a set of contacts extractedfrom the structural database that is unique and which does not overlapwith the set of contacts associated with any other atom type (apart fromthose contacts which involve the target atom type itself as the iotaatom). This means that a distribution of theoretical locations for agiven iota atom type, or a predetermined group of related iota atomtypes, determined based on one target atom in the binding site may becombined (e.g. by summing) more efficiently (e.g. without weighting)with a distribution of theoretical locations for an iota atom type, orpredetermined group of related iota atom types, determined based onanother target atom in the binding site, for example to provide animproved prediction of one or more favoured regions for the iota atomtype or predetermined group of related iota atom types. Also becauseeach target atom in the binding site is classified uniquely, there areno variations in bond lengths or angles to consider and hence thetheoretical location of a given iota atom is more precise.

In an embodiment, simple rules are applied to uniquely identify theneighbouring atoms for the purposes of triangulation. No assumptionsneed to be made about the chemical nature of neighbouring atoms, whichis necessary for example where contact types are characterized in termsof the 163 3-atom fragment types of Laskowski et al.

In an embodiment, the spatial data extracted in step c) defines theposition of each iota atom specified in the theta contact set bygeometrical reference to the position of the theta atom and to thepositions of third and fourth atoms, wherein the third atom iscovalently bonded to the theta atom and the fourth atom is covalentlybonded to the third atom. In an example of such an embodiment, for eachiota atom specified in the theta contact set, said spatial dataextracted in step c) further defines the position of fifth and sixthatoms by geometrical reference to the position of the theta atom and tothe positions of the third and fourth atoms, wherein the fifth atom iscovalently bonded to the iota atom and the sixth atom is covalentlybonded to either the fifth atom or the iota atom.

In an embodiment, the superimposition in or around the target site ofstep (d) comprises: parsing the theta contact set to extract spatialdata for contacts comprising the given iota atom type or one or more ofthe predetermined group of related iota atom types; and plotting thisspatial data to determine theoretical locations representing where eachiota atom type, or each of the one or more of the predetermined group ofrelated iota atom types, would be located if: i) the theta atom of thecontact were located at the position of the corresponding theta atom inthe target binding site; and ii) the third and fourth atoms of thecontact were located at the positions of the third and fourth atoms ofthe corresponding theta atom in the target binding site. In anembodiment, the spatial data is parsed against the contextual databefore the plotting step.

In an embodiment, a region in which a density of theoretical locationsfor the iota atom type (or one or more of the predetermined groups ofrelated iota atom types) is above a predetermined threshold isidentified as one of the favoured regions. In an example of such anembodiment, theoretical locations for the given iota atom type, or forone or more of the predetermined group of related iota atom types, aredetermined for a plurality of theta atoms on the target list and aregion in which a density of the cumulative theoretical locations isabove the predetermined threshold is identified as one of the favouredregions.

Thus, theoretical locations are combined cumulatively from differentatoms in the binding site before the density of theoretical locations isobtained for the purposes of predicting favoured regions. This resultsin a more accurate statistical representation of the probability of agiven iota atom type, or in a given group of related iota atom types,being positioned at a given location because it takes into account thecontributions from all relevant atom types in the binding site in aproportionate and unbiased manner. In Laskowski et al., in contrast, thedensity functions are derived for groups of 3-atom fragments. Each atommay be associated with several different groups of 3-atom fragments andso it is not possible simply to add together density functions in amanner comparable with embodiments of the present invention. Instead, itis necessary to perform weighting and/or averaging before combiningdensity functions, which increases complexity and/or reduces accuracy.

In an embodiment only contacts between atoms that are separated fromeach other by four residues or more are used for identifying favouredregions. This significantly reduces or avoids bias due to short rangesecondary structure. In an embodiment, the contact data predominantlyrepresents long-range, across-fold protein data.

According to a further aspect of the invention, there is provided amethod of generating a database for use in a method for designing aligand ab initio that will bind to a binding site of a macromoleculartarget, or of identifying a modification to a ligand for improving theaffinity of the ligand to a binding site of a macromolecular target,comprising:

analysing the relative positions of atoms in each of a plurality ofproteins or other biological macromolecules in order to identifyinstances of a non-bonding intra-molecular contact between a first atom,referred to as a theta atom, and a second atom, referred to as an iotaatom, of the protein or macromolecule; and

generating a database that for each identified contact specifies: thetype of the theta atom, the type of the iota atom, and the position ofthe iota atom relative to the theta atom;

wherein a non-bonding intra-molecular contact is defined as an instancewhere the following conditions are satisfied:

s−Rw≦t, where s is the separation between the theta and iota atoms, Rwis the sum of the van de Waals radii of the theta and iota atoms, and tis a predetermined threshold distance of typically 2.5 angstroms andpreferably 0.8 angstroms; and

wherein in the case of proteins, the theta and iota atoms are on aminoacid residues separated from each other by at least four residues on alinear polypeptide or are on separate polypeptide chains.

According to a further aspect of the invention, there is provided amethod of generating a database for use in a method for designing aligand ab initio that will bind to a binding site of a macromoleculartarget, or of identifying a modification to a ligand for improving theaffinity of the ligand to a binding site of a macromolecular target,comprising:

analysing the relative positions of atoms in each of a plurality ofproteins or other biological macromolecules in order to identifyinstances of a non-bonding intra-molecular contact between a first atomreferred to as a theta atom, and a second atom, referred to as an iotaatom, of the protein or macromolecule; and

generating a database that for each identified contact specifies: thetype of the theta atom, the type of the iota atom, and the position ofthe iota atom relative to the theta atom;

wherein a non-bonding intra-molecular contact is defined as an instancewhere the following condition is satisfied:

s−Rw≦t, where s is the separation between the theta and iota atoms, Rwis the sum of the van de Waals radii of the theta and iota atoms, and tis a predetermined threshold distance of typically 2.5 angstroms andpreferably 0.8 angstroms; and

wherein the method comprises sub-dividing the database to form groups ofidentified contacts in which the theta atom is one and only one of the167 non-hydrogen atoms present in the 20 natural amino acids of proteinsand the iota atom is in one and only one of a plurality ofnon-overlapping groups obtained by sorting the 167 non-hydrogen atomspresent in the 20 natural amino acids of proteins into groups based onchemical similarity.

Embodiments of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings in which correspondingreference symbols represent corresponding parts, and in which:

FIG. 1 is a schematic illustration of an example nomenclature for atomsat a non-bonding intra-molecular or inter-molecular contact andneighbouring atoms that are used for coordinate normalization;

FIGS. 2-6 illustrate a process of coordinate normalization for an atomin a non-bonding intra-molecular or inter-molecular contact;

FIG. 7 is a flow chart illustrating steps in a method of designing aligand ab initio that will bind to a binding site of a macromoleculartarget, or of identifying a modification to a ligand for improving theaffinity of the ligand to a binding site of a macromolecular target;

FIG. 8 is a computer generated visualization depicting a light chainthreonine 30 to arginine 30 mutation;

FIG. 9 is a computer generated visualization depicting a light chainarginine 54 to serine 54 mutation;

FIG. 10 is a computer generated visualization depicting a light chainserine 56 to isoleucine 56 mutation;

FIG. 11 is a computer generated visualization depicting a light chainserine 60 to aspartate 60 mutation;

FIG. 12 is a computer generated visualization depicting a light chainthreonine 72 to arginine 72 mutation;

FIG. 13 is a computer generated visualization depicting the combinationof 5 mutations in antibody 496i light chain resulting in a 180-foldimproved affinity to IL17F;

FIG. 14 is a flow chart illustrating steps in a method of predicting theeffects of point mutations at the VH-VL interface of a Fab;

FIG. 15 is a computer generated visualisation depicting a heavy chainthreonine 71 to arginine 71 mutation;

FIG. 16 is a computer generated visualisation depicting a light chainserine 107 to glutamic acid 107 mutation;

FIG. 17 is a computer generated visualisation depicting a light chainthreonine 109 to isoleucine 109 mutation;

FIG. 18 is a computer generated visualisation depicting the combinationof three mutations in Fab X resulting in a Tm of 81.2° C.

The Worldwide Protein Data Bank (wwPDB) maintains an archive ofmacromolecular structural data that is freely and publicly available tothe global community. By May 2013 this dataset had reached the milestoneof 90 000 structures. Most of these macromolecules are proteins of whichthe majority have been determined by X-ray crystallography. Depositeddata thus contains three dimensional data at the atomic level in theform of Cartesian coordinates of individual atoms that make up therespective protein structure.

The inventors hypothesised that it is possible to extract usefulinformation from this archive which could be applied to aid the designof novel therapeutics. The polypeptide chains of a nascent protein foldinto complex three dimensional tertiary and quaternary structures in aremarkably reproducible manner to yield the mature protein. Interactionsaffecting the formation of the secondary structure of proteins, elementssuch as helices, beta-sheets and turns, are known. However, rulespredicting the higher orders of protein folding are poorly understood.Nonetheless, the inventors have realised that there must be preciserules that govern the interaction of non-bonding but “contacting” atoms,either within the same molecular, for example on opposing faces of aprotein fold, or on different molecules.

In an embodiment, a structural database of biological macromolecules(e.g. the wwPDB) is analysed to extract such rules, and the rules areapplied to facilitate drug discovery. An example of such a process isdescribed below.

In an embodiment, non-bonding pairs of contact atoms (referred torespectively as “theta” and “iota” atoms) are identified for eachmacromolecule (e.g. protein), or a subset of fewer than all of themacromolecules, in the structural database of macromolecules (e.g. thewwPDB). Such contacts may occur for example between opposing residues ofa protein fold or between opposing monomer units of a macromolecularfold (between separate chains of a macromolecular structure) or betweentwo interacting macromolecular partners. Each contact is classified asbeing between a theta atom on one side of the fold or first interactingpartner and an iota atom on the opposing side or second interactingpartner.

In an embodiment, the non-bonding intra-molecular or inter-molecularcontacts are defined as an instance where the following condition issatisfied: 1) s−Rw≦t, where s is the separation between the two atoms ofthe contact, Rw is the sum of the van de Waals radii of the two atoms ofthe contact, and t is a predetermined threshold distance; and,optionally, the following condition also: 2) the two atoms of thecontact are separated from each other by at least four residues along alinear polypeptide chain or are on separate polypeptide chains. In anembodiment, the predetermined distance is 2.5 angstroms. In anotherembodiment, the predetermined distance is 1.5 angstroms. In anotherembodiment, the predetermined distance is 1.0 angstroms. In anotherembodiment, the predetermined distance is 0.8 angstroms.

In the description below, any reference to “contact” is understood tomean “non-bonding intra-molecular contact or inter-molecular contact”according to the definition given above.

Databases such as the wwPDB may have information about proteins that arevery similar to each other and/or which have related structures. In anembodiment, the database is parsed in order to avoid/reduce bias causedby such similarities/relationships. In an embodiment, the parsing isperformed based on primary sequence homology, for example such that onlyone representative structure of each family of similar/related proteinsis selected for analysis. Additionally or alternatively, one or morefurther selection criteria may be used, for example high resolution andlow temperature factor structures may be incorporated.

In an embodiment, a secondary database is constructed starting from the(primary) structural database of biological macromolecules (e.g. thewwPDB). The secondary database comprises information about thenon-bonding intra-molecular or inter-molecular contacts. In anembodiment, the secondary database comprises information about more than1 million contact pairs, optionally more than 5 million contact pairs,optionally more than 11 million contact pairs. In one embodiment, thesecondary database comprises information from more than 15 millioncontact atom pairs, extracted from around 20 000 non-homologousproteins.

In an embodiment, the secondary database contains information about theprecise atom types of the contact pair. In an embodiment, the secondarydatabase contains spatial data defining the three dimensionalrelationship of the theta atom to the iota atom. In an embodiment, thesecondary database also contains contextual data concerning the localenvironment of the contact. In an embodiment, the contextual datacontains information concerning the local environment of each contactpair, including one or more of the following in any combination:secondary structure, amino acid types or other monomer types comprisingthe contact pair, adjacent monomer units and/or local geometry thereofin a polymer chain either side of the contact, adjacent amino acids in apolypeptide chain on either side of the contact, local geometry of thesaid adjacent monomer units or amino acids, temperature factor of thetheta atom, temperature factor of the iota atom, accessible surface areaof the theta atom, accessible surface area of the iota atom, the numberof different iota atom contacts for the particular theta atom and thenumber of other theta atoms on the same monomer unit as the theta atom.

In an embodiment, the 3-D coordinates of the contact pair and covalentlyattached adjacent atoms are normalized, as a group, to a common databasereference frame as described below. This simplifies subsequent analysisof potential underlying contact patterns or rules and application of anysuch rules to drug design.

In an embodiment, the theta atom type is identified as being one andonly one of: the 167 covalent atom types (excluding hydrogen) that makeup the 20 natural amino acid building blocks of proteins (in this casethe secondary database may be divided accordingly and compriseinformation about up to 27889, 167×167, different contact types); and/orthe 82 non-hydrogen atoms present in the 4 nucleotides of thedeoxyribonucleic acid polymer (DNA); and/or the 42 non-hydrogen atomspresent in the methylated DNA nucleotides, cytidine phosphate andadenosine phosphate; and/or the 85 non-hydrogen atoms present in the 4nucleotide phosphates of the ribonucleic acid polymer (RNA); and/or the89 non-hydrogen atoms present in 2-O′-methylated ribose nucleotidephosphates of RNA; and/or the over 400 non-hydrogen atoms present in thecommonest post-transcription base modified RNA.

In an embodiment, the iota atom type is identified as being one and onlyone of: the 167 covalent atom types (excluding hydrogen) that make upthe 20 natural amino acid building blocks of proteins; and/or the oxygenatom present in protein bound, structurally relevant, water molecules(this may be useful because crystal structures in the primary databaseoften contain structurally relevant water molecules, i.e. certainprotein atoms show definite interactions with bound water molecules);and/or the 82 non-hydrogen atoms present in the 4 nucleotides of thedeoxyribonucleic acid polymer (DNA); and/or the 42 non-hydrogen atomspresent in the methylated DNA nucleotides, cytidine phosphate andadenosine phosphate; and/or the 85 non-hydrogen atoms present in the 4nucleotide phosphates of the ribonucleic acid polymer (RNA); and/or the89 non-hydrogen atoms present in 2-O′-methylated ribose nucleotidephosphates of RNA; and/or the over 400 non-hydrogen atoms present in thecommonest post-transcription base modified RNA.

In a contact pair the opposing atom is viewed and recorded from eitherside on the contact. The nomenclature in an example embodiment isdescribed below and illustrated schematically in FIG. 1.

The atom on the reference side of the contact is termed the theta atom 1whilst the opposing atom is termed the iota atom 2. In this example, thefurther atoms used for normalizing the 3-D coordinates are defined asfollows. The next atom to which the theta atom 1 is covalently bonded,in the direction of the C alpha atom of that amino acid, is referred toas the third atom 3 and the next atom again, the fourth atom 4. Thefourth atom 4 is covalently bonded to the third atom 3. The next atom towhich the iota atom 2 is covalently bonded, in the direction of the Calpha atom of the respective amino acid, is termed the fifth atom 5 andthe next again atom, the sixth atom 6. The sixth atom is covalentlybonding to either the fifth atom or the iota atom 2.

In an embodiment, to avoid instances of ambiguity the third and fourthatoms are chosen uniquely for each specified theta atom type. In anembodiment, the fifth and sixth atoms are also chosen uniquely. In anembodiment, the following convention is applied. If the theta atom 1happens to be a C alpha atom, then the third and fourth atoms are thebackbone carbonyl carbon and oxygen atoms respectively. If the thetaatom 1 is a backbone carbonyl carbon, then the third atom 3 and thefourth atom 4 are the C alpha carbon and the backbone nitrogenrespectively. If the theta atom 1 is the backbone nitrogen, then thethird atom 3 and the fourth atom 4 are the C alpha carbon and thebackbone carbonyl carbon respectively. If the theta atom 1 is a C betacarbon atom, then the third atom 3 and the fourth atom 4 are the C alphacarbon and the backbone carbonyl carbon respectively. In phenylalanineand tyrosine side chains where there is a choice of two epsilon carbonatoms for the third and fourth atom positions, then the atom closest tothe backbone nitrogen atom is selected.

In an embodiment, coordinate normalisation of each contact is performedon the theta, iota, third and fourth atoms, optionally also the fifthand sixth atoms, as a group so that their 3-D relationship ismaintained. The resulting normalized coordinates may be referred to as anormalized coordinate group. In an embodiment, this is achieved bycarrying out the following steps in sequence, as illustrated in FIGS.2-6.

FIG. 2 illustrates a theta atom 1, third atom 3 and fourth atom 4 of anon-bonding intra-molecular or inter-molecular contact positionedrelative to a reference frame, defined relative to x-, y- and z-axes,according to coordinates given in a primary database (such as thewwPDB). In a first step of an example coordinate normalization process,the atom group coordinates are translated so that the theta atom 1 liesat the zero coordinate (FIG. 3). Next, the group as a whole is rotatedabout the z-axis until the third atom 3 is at y=0 (FIG. 4). Next, thegroup is rotated about the y-axis until the third atom is at y=0 and z=0(FIG. 5). Next, the group is rotated about the x-axis until the fourthatom 4 is at y=0 and the group as a whole lies in the x-y plane (allthree atoms at y=0; FIG. 6). In this manner each of the 167 first atomtypes can be superimposed for that type and the secondary databasesub-divided accordingly. In turn each of the first atom divisions can besub-divided into 167 iota atom types, facilitating the analysis of thespatial distribution of each iota atom type relative to each theta atomtype.

In an embodiment, the distribution patterns of iota atoms relative totheta atoms are analysed in order to identify similarities between thedistribution patterns for nominally different iota atom types. In thisway, the unique iota atom types (e.g. the 167 covalent atom typesmentioned above) can be combined into a number of groups (hereinreferred to as “predetermined groups of related iota atom types”) tosimplify subsequent use of the data. Grouping together the atom typesaccording to the similarity of distribution patterns reduces thecomputational load associated with the method described below withreference to FIG. 7 for example, thus increasing speed and/or reducinghardware expense.

In an embodiment, this process is simplified by using polar coordinatesrather than Cartesian coordinates (in an embodiment, this is achieved byperforming conversion processing between Cartesian coordinates and polarcoordinates, for example where the data in the primary database ispresented using Cartesian coordinates). In an embodiment,two-dimensional polar coordinates are used, specifying the relativepositions of the theta and iota atoms in terms only of the two polarangles θ (theta) and Φ (phi) (corresponding to latitude and longitude ona globe). The resulting two-dimensional latitude-longitude plots do notshow any information about variations in the distance between the thetaand iota atoms. However, it is found that this distance is relativelyconstant, so that the theta-phi plots contain most of the relevantinformation concerning the contact. Reducing the analysis to a problemin two dimensions rather than three greatly improves the efficiency ofsubsequent analyses. In an embodiment, contour lines are used toillustrate variations in the relative position of the iota atom. Thecontour lines may represent lines of constant “density” or probabilityof a relative positioning of the theta and iota atoms.

Analysis of such polar angle plots has revealed that a particularlyimportant factor governing the pattern of iota atom frequencies is theelemental nature and hybridisation state of the iota atoms, i.e. C sp3,C sp2(aromatic), C sp2(non-aromatic), N sp3, N sp2, O sp3, O sp2 or S.As a result, it is possible to improve analysis efficiency by groupingthe 167 atom types according to these identified eight groups. In otherembodiments, a different grouping may be used.

In general, environmental factors around the contact, such as the natureof adjacent amino acids, make less difference to the iota frequencypattern, with the exception of secondary structure. As might be expectedthe frequency patterns of backbone amide nitrogen theta atoms versusbackbone oxygen iota atoms and vice-versa are skewed by secondarystructure, in particular as regards whether or not they are from betasheet.

In an embodiment, the secondary database tags contact data with thelocal secondary structure type (helix, beta sheet or random coil). Thisprovides the basis for differentiating any potential influence ofsecondary structure on contact patterns at a later stage.

In an embodiment, a method is provided based on the above that assistswith the identification of modifications to a ligand that improve thestrength of binding, or affinity, of the ligand to a binding site. In anembodiment, the method is used to assist with NCE or biologic drugdesign. In respect of the former, the method may be useful forpredicting ‘hotspots’ or pharmacophore atom positions in potential drugbinding sites of target proteins. This can facilitate de novo drugdesign. In situations where there is an available structure of chemicalmatter bound in a binding site, the method can suggest atom types andpositions for elaboration of the chemistry to obtain a ligand withbetter binding characteristics. In the case of protein drugs suchantibodies, the method may be used to predict mutations in the proteinor antibody binding site that would lead to improvement in bindingaffinity or specificity. The method may also be used to suggestpositions for modification within a macromolecular structure to improvethe properties of the macromolecule. For example, as illustrated in theExamples section below, the method may be used to identify pointmutations within antibody VH and VL chains in order to improve thethermal stability of the antibody. The mutations are on separate chains,but are still within the antibody macromolecule.

FIG. 7 illustrates an example method for designing a ligand ab initiothat will bind to a binding site of a macromolecular target, or ofidentifying a modification to a ligand for improving the affinity of theligand to a binding site of a macromolecular target.

In step S1, data representing the target binding site of a targetprotein is obtained, for example from a local or remote memory device 5.A target list of atoms forming the surface of the target binding site isidentified.

In step S2, each atom in the target list is identified as a particulartheta atom type.

In step S3, information is extracted from a structural database ofbiological macromolecules (e.g. the wwPDB), provided for example by alocal or remote memory device 7, about non-bonding, intra-molecular orinter-molecular contacts in which the first atom in a contacting pair ofatoms is a particular theta atom type and the opposing, second atom ofthe pair is a particular iota atom type. The extracted informationcomprises spatial and/or contextual data about the iota atom relative tothe theta atom. The data is collected for a plurality of contacts of thegiven theta atom type and the resulting set of data is referred to as atheta contact set. In an embodiment, the theta contact set comprisesdata collected for all of the available contacts of the given theta atomtype. The extracted information may form a database that is an exampleof the “secondary database” discussed above. In an embodiment, theinformation extracted in step S3 is collected in a secondary databasethat comprises one and only one theta contact set for each of the thetaatom types. In an example of such an embodiment, the theta contact setsof the secondary database are subdivided into a plurality ofnon-overlapping iota atom types or non-overlapping groups of relatediota atom types. In an example of such an embodiment, the database issub-divided to form groups of identified contacts in which the firstatom is one and only one of the 167 non-hydrogen atoms present in the 20natural amino acids of proteins and the second atom is in one and onlyone of a plurality of non-overlapping groups obtained by sorting the 167non-hydrogen atoms present in the 20 natural amino acids of proteinsinto groups based on chemical similarity.

In step S4, for each theta atom identified in the target list in stepS2, data relating to a given iota atom type, or a predetermined group ofrelated iota atom types, from the corresponding theta contact setextracted in step S3 is superimposed in or around the target bindingsite. In an embodiment, the superimposition comprises: parsing the thetacontact set to extract spatial data for contacts comprising the giveniota atom type or one or more of the predetermined group of related iotaatom types; and plotting this spatial data to determine theoreticallocations representing where each iota atom type, or each of the one ormore of the predetermined group of related iota atom types, would belocated if: i) the theta atom of the contact were located at theposition of the corresponding theta atom in the target binding site; andii) the third and fourth atoms of the contact were located at thepositions of the third and fourth atoms of the corresponding theta atomin the target binding site. Where determined theoretical locationsconflict with binding site atoms and/or are buried within the targetprotein, these may be removed from further analysis. For example if itis determined that the theoretical location of an individual iota atomintersects with the location of an atom of the target macromoleculecloser than Rw−0.2 angstroms then the iota atom is excluded fromsubsequent analysis.

In step S5, the superimposed data is combined and/or parsed in such away as to predict one or more favoured regions of the binding site wherethe given iota type, or the predetermined group of related iota atomtypes, has high theoretical propensity.

In step S6, a candidate ligand is notionally docked into the bindingsite. Data defining the candidate ligand may be provided for examplefrom a local or remote memory device 9. A comparison is then madebetween the type and position of one or more of the atoms of thecandidate ligand with the predicted favoured regions for the respectiveiota atom types. On the basis of the comparison, modifications to thecandidate ligand, in terms of alternate or additional candidate ligandatoms, are identified that will produce a greater intersection betweenthe alternate and/or additional candidate ligand atoms and therespective iota atom type favoured regions, leading to an improvement inthe affinity of the modified candidate ligand to the binding sitecompared to the unmodified candidate ligand.

In step S7, the modified candidate ligand is output either as a proposedimprovement to an existing ligand or as part of an ab initio design of anew ligand. Optionally steps S7 and S6 can be iterated to further modifythe ligand. The local or remote memory devices 5, 7 and 9 may beimplemented in a single piece of hardware (e.g. a single storage device)or in two or more different, separate devices.

In an embodiment, the modified candidate ligand is output to an outputmemory device for storage or transmission and/or to a display forvisualization.

In an embodiment, the type of a given theta atom is identified uniquelyin step S2 such that there is no intersection between the group ofcontacts for which information is extracted in step S3 for the giventheta atom and the group of contacts for any other theta atom type (withthe exception of contacts involving the given theta atom type as theiota atom).

In an embodiment, step S5 comprises determining one or more favouredregions for each of a plurality of different iota atom types and/orpredetermined groups of related iota atom types. In such an embodiment,the comparison step S6 may be repeated for each of the plurality ofdifferent iota atom types and/or predetermined groups of related iotaatom types, in order to identify potential modifications that involvethe different iota atoms types or predetermined groups of related iotaatom types.

In an embodiment, steps S2-S7 are performed for a plurality of differentatoms in the binding site. In an embodiment, as described below,favoured regions may be determined more accurately by cumulativelycombining (e.g. summing) the distributions of determined theoreticallocations of the iota atom types as derived for a plurality of differentatoms in the binding site.

In an embodiment, the analysis is extended such that, for each favouredregion, vectors are derived that describe the position of the fifth atomrelative to its respective iota atom. Analysis is carried out on thevectors to identify a favoured bond vector representing a prediction ofthe covalent attachment of a theoretical consensus iota atom in theregion. The identified favoured bond vector can then be used to refinethe design of the candidate ligand and/or to refine the modification ofthe candidate ligand, as applicable. The identified favoured bond vectormay be used for example to indicate how iota atoms in different favouredregions might be bonded together, thus assisting with the identificationof modifications involving plural additions or exchanges of atoms. In anembodiment the analysis is a cluster analysis.

The distribution of theoretical locations gives a measure or propensityof how a particular iota atom type (or predetermined group of relatediota atom types) will be favoured at different locations in the bindingsite. In an embodiment, a region in which a density of the theoreticallocations is above a predetermined threshold is identified as one of thefavoured regions. The density of theoretical locations is a measure ofthe number of determined theoretical locations that occur in a givenspatial volume for example. In an embodiment, iota atom theoreticallocations are determined for a plurality of target atoms in the bindingsite and a region in which a density of the cumulative theoreticallocations for the iota atom (or predetermined group of related iota atomtypes) for the plurality of target atoms is above the predeterminedthreshold is identified as one of the favoured regions. The theoreticallocations determined for different target atoms in the binding site maybe summed, for example, in order to obtain the cumulative theoreticallocations. This approach to taking into account the effects of differentatoms in the binding site is computationally efficient and minimizesloss of information about the interaction between the candidate ligandand the binding site. The approach is facilitated by thecharacterization of contacts in terms of pairs of simple atom types orsimple atom types in combination with atoms of predetermined groups ofsimple atom types. Such an approach is not valid when contacts arecharacterized in terms of 3-atom fragments, such as is the case inLaskowski et al. for example.

The obtained distributions of theoretical locations can be transformedin various ways to create probability density functions, i.e. astatistical potential for the preference of a given iota atom type at agiven position in the binding site. In turn, probability densityfunctions can be treated in an analogous way to electron density andconverted into ccp4 files which are a standard way of visualising suchmaps within molecular graphics software, e.g. Pymol.

In an embodiment, in step S5, the one or more favoured regions is/areexpressed in polar coordinates, optionally comprising only the polar andazimuthal angles, optionally wherein the reference frame is normalizedby reference to the third and fourth atoms 3,4.

In an embodiment, step S6 comprises: identifying a modification of thecandidate ligand that increases a degree of overlap between an atom ofthe candidate ligand (whether present before the modification or not)and a predicted favoured region for an atom of the same type in thebinding site. In an embodiment, the generated distributions oftheoretical locations and/or favoured regions are inspected, for exampleby computer software or manually, as superimpositions on the targetmacromolecular structure in complex with the respective candidateligand. If for instance the candidate ligand relates to an antibody, theinterface between the antibody and target macromolecule may be examinedto determine the degree of overlap between antibody atoms and therespective iota atom theoretical location distributions and/or favouredregions identified for that atom. In some cases the degree of overlapwill already be high. However, in other regions of the interface theoverlap may be low. It is in these regions where mutations in theadjacent amino acid residue of the antibody may be most effectivelyidentified/proposed. In an embodiment each of the 19 other natural aminoacids is considered in turn at this position, in all of their respectiverotamer conformations and in each case the degree of overlap with therelevant iota atom theoretical location distributions and/or favouredregions is examined, with the aim of selecting those residues with themaximum degree of overlap for proposed mutations. In this manner, arational means of selecting mutations is provided that may generateaffinity improvements in the chosen antibody. Individual point mutationsin different regions of the antibody-target protein interface may begenerated; those that lead to affinity improvement can be tested incombinations of two or more that may give synergistic increases inaffinity.

In an embodiment, step S6 comprises replacing each of one or more of(optionally all of) the amino acid residues of the ligand that is/are indirect contact with the target binding site, or in close proximity tothe target binding site, with each of one or more of (optionally all of)the residues chosen from the other 19 natural amino acids. Each suchreplacement, is referred to herein as a “residue replacement” andinvolves the modification of a ligand by a single replacement of oneresidue with a different residue.

In an embodiment, for each such residue replacement that does not causeconflict with adjacent atoms of the ligand or target (e.g overlapbetween one or more atoms of the replacement residue and one or moreother atoms of the target or ligand), the type and position of each atomof the replacement residue is compared with the respective iota atomtype favoured regions to identify whether they will produce a greaterintersection than the atoms of the original residue. In an embodiment alist is then output of the residue replacements that are identified asproducing a greater intersection than atoms of the original residue. Inan embodiment, for each of the listed residue replacements, thecandidate ligand is then mutated to produce a modified,single-residue-mutated ligand that incorporates the residue replacement.The affinity of each modified ligand to the target binding site can thenbe tested by experiment in order to identify those modifications whichprovide the greatest affinity improvement for the candidate ligand. Forexample, a group of residue replacements may be identified that yield aresidue replacement that is greater than a predetermined threshold. Inan embodiment, the predetermined threshold may be zero so that theselected group consists only of residue replacements that improve theaffinity to some extent. More advanced modifications of the candidateligand can then be carried out based on this information. For example,in an embodiment the candidate ligand may be modified to incorporate aplurality of residue replacements, for example a plurality of thoseresidue replacements that, individually, were determined as providingthe greatest affinity improvements. In this way it is possible to designa ligand that has an affinity that is improved even more than ispossible by replacing only a single residue.

In some embodiments, lists of residue replacements may be produced thatsatisfy providing a greater intersection than atoms of the originalresidue (have a ΔIOTAscore of less than zero). The lists mayadditionally be ranked based on other criteria. For example, lists mayalso be filtered based on ΔΔG scores (see below). Residue replacementswith ΔΔG scores of less than zero imply stronger interactions comparedwith the original residue. Therefore a list may be produced whereresidues satisfy both criteria of a ΔIOTAscore of less than zero (anegative ΔIOTAscore), and a ΔΔG of less than zero (a negative ΔΔG). Thisis illustrated in Example 2 below.

In the case of chemical matter, for instance a crystal structure of alow molecular weight chemical fragment bound in a pocket of a targetprotein, iota atom theoretical location distributions and/or favouredregions displayed in the binding site may suggest atom types and vectorsof chemical bonds for fragment growth that may yield a prototype NCE ofhigher potency.

In an embodiment, a plurality of modifications to the candidate ligandare identified. In this case, the method may further comprise selectinga subset of the identified modifications, for example to identify themodifications which are likely to be most effective in terms ofimproving affinity. The selection may be carried out based on the extentto which the intersection between the alternate and/or additionalcandidate ligand atoms and the respective iota atom type favouredregions is greater compared to the unmodified candidate ligand. Forexample, modifications that result in an increase in the intersectionthat is above a predetermined threshold may be selected andmodifications that result in an increase in the intersection that isbelow a predetermined threshold may be discarded. An example of such aselection process is discussed below in the context of “Example 2”. The“ΔIOTAScore” is an example of a measure of the extent to which theintersection between the alternate and/or additional candidate ligandatoms and the respective iota atom type favoured regions is greatercompared to the unmodified candidate ligand. Alternatively oradditionally, the selection may be carried out based on the extent towhich one or more factors contributing to the total energy of thecomplex formed by the binding of the modified candidate ligand to thebinding site is/are reduced compared to the case where the unmodifiedcandidate ligand is bound. For example, modifications that result in adecrease in the one or more factors (e.g. a decrease in a sum of the oneor more factors) that is above a predetermined threshold may be selectedand modifications that result in a decrease in the one or more factors(e.g. a decrease in a sum of the one or more factors) that is below apredetermined threshold may be discarded. An example of such a selectionprocess is discussed below in the context of “Example 2”. The “RosettaΔΔG score” is an example of a measure of the extent to which one or morefactors contributing to the total energy of the complex formed by thebinding of the candidate ligand to the binding site is/are reduced.Examples of factors contributing to the total energy of the complexinclude a Lennard-Jones term, an implicit solvation term, anorientation-dependent hydrogen bond term, sidechain and backbone torsionpotentials derived from the PDB, a short-ranged knowledge-basedelectrostatic term, and reference energies for each of the 20 aminoacids that model the unfolded state, as discussed below.

In an embodiment, the method of identifying a modification to acandidate ligand is a computer-implemented method. In an embodiment, anyone or more of the steps S1-S7 is/are performed on a computer. In anembodiment, all of the steps S1-S7 is/are performed on a computer. Inaddition, any one or more of the steps S101-S109 of FIG. 14(illustrating a workflow for predicting point mutations at the VH-VLinterface of an antibody) may be automated. Any one or more of the stepsS101-S109 may be performed on a computer. In one embodiment, all of thesteps S101-S109 are automated. All of the steps S101-S109 may beperformed on a computer.

A wide range of standard computing configurations, well known to theperson skilled in the art, could be used as platforms to implement themethod. The method is not limited to any particular hardwareconfiguration, operating system or means for storing or transmittingsoftware necessary for defining and/or implementing the method steps. Inan embodiment, a computer readable medium or signal is provided thatcomprises computer readable instructions (e.g. code in a computerprogramming language) for causing a computer to carry out the method.

In an embodiment, a method of manufacturing a therapeutic ligand isprovided. In an embodiment, the method of manufacturing comprisesdesigning a new ligand or modifying an existing ligand according to oneor more of the embodiments described above.

EXAMPLE 1 Affinity Maturation of a Fab Fragment of an Anti-IL17FAntibody Introduction

In vitro methods of antibody affinity maturation are well known (seeU.S. Pat. No. 8,303,953 B2 column 13 lines 19 to 33). In a recentexample Fujino et at (Fujino et at (2012) “Robust in vitro affinitymaturation strategy based on interface-focused high-throughputmutationalscanning”, Biochem. Biophys. Res. Comm., 428, 395-400) reporta high throughput mutational scanning strategy based on ribosome displaypanning of single point mutant single-chain Fab libraries at each of 50identified antigen interface residues of the antibody, followedcombinatorial ribosome display of enhanced binders that resulted inidentification of a Fab with over 2000-fold affinity improvement. Suchmethods require a large investment in laboratory based resources andtherefore various groups (reviewed by Kuroda et at (2012)“Computer-aided antibody design”, Prot. Eng. Design & Selection, 25,507-521) have investigated in silico methods that predict improvementsin antibody affinity so as to reduce or eliminate the need for screeninglarge numbers of mutated antibody variants for improved affinity. Thesecomputer-aided antibody design protocols are either knowledge-based;i.e. using statistical potentials derived from observational data orphysics-based, i.e. using and energy functions derived from models ofthe underlying physical interactions. Lippow et at (2007) (Lippow et at(2007), “Computational design of antibody-affinity improvement beyond invivo maturation”, Nat Biotech., 25, 1171-1176) have achieved moderatesuccess with the latter approach based on electrostatic interactions,but our understanding of parameterisation of such methods is still farfrom complete. Knowledge-based methods to date tend to identifyindividual antibody residues for random mutagenesis (e.g. Barderas et at(2008) “Affinity maturation of antibodies assisted by in silicomodelling”, PNAS, 105, 9029-9034), which still entail considerablelaboratory based effort.

In this Example, a knowledge-based approach was applied to affinitymature a Fab fragment of the anti-IL17F antibody described in U.S. Pat.No. 8,303,953 B2. The final affinity matured antibody is described in WO2012/095662 A1 as a full length IgG1 molecule. However, the method bywhich this antibody was affinity matured is not disclosed in the latterpublication.

Methods Identification of Target (Theta) Atoms Comprising the IL17FEpitope

Using the coordinates of the co-crystal IL17F/Fab 496 complex structuredescribed in WO 2009/130459 A2, all IL17F atoms within 6 Å of any Fab496 atom were identified as epitope atoms and are listed in Table 1. Ofthis list of 209 theta atoms there are 86 specific theta atom types.

TABLE 1 List of atoms comprising the IL17F epitope where the notation(1)-(2)-(3)- (4) designates (1) the respective F and I chains of theIL17 homodimer, (2) the amino acid residue and (3) residue number and(4) the atom type. F-ASN-53-CG F-ASN-89-C F-ILE-129-CA I-VAL-38-NF-ASN-53-OD1 F-ASN-89-CA F-ILE-129-CB I-VAL-38-O F-ALA-70-C F-ASN-89-CBF-ILE-129-CD1 I-SER-39-C F-ALA-70-O F-ASN-89-CG F-ILE-129-CG1I-SER-39-CA F-GLN-71-C F-ASN-89-N F-ILE-129-CG2 I-SER-39-CB F-GLN-71-CAF-ASN-89-ND2 F-ILE-129-N I-SER-39-N F-GLN-71-CB F-ASN-89-OD1 F-ILE-129-OI-SER-39-OG F-GLN-71-CD F-SER-90-C F-HIS-130-C I-MET-40-C F-GLN-71-CGF-SER-90-CA F-HIS-130-N I-MET-40-N F-GLN-71-N F-SER-90-CB F-HIS-130-OI-MET-40-O F-GLN-71-NE2 F-SER-90-N F-HIS-131-C I-SER-41-C F-GLN-71-OF-SER-90-O F-HIS-131-CA I-SER-41-CA F-GLN-71-OE1 F-SER-90-OGF-HIS-131-CE1 I-SER-41-CB F-CYS-72-C F-VAL-91-C F-HIS-131-N I-SER-41-NF-CYS-72-CA F-VAL-91-CA F-HIS-131-ND1 I-SER-41-O F-CYS-72-CB F-VAL-91-CBF-HIS-131-O I-SER-41-OG F-CYS-72-N F-VAL-91-CG1 F-VAL-132-C I-ARG-42-CF-CYS-72-O F-VAL-91-CG2 F-VAL-132-CA I-ARG-42-CA F-CYS-72-SG F-VAL-91-NF-VAL-132-CB I-ARG-42-CB F-ARG-73-C F-VAL-91-O F-VAL-132-CG1 I-ARG-42-CDF-ARG-73-CA F-PRO-92-C F-VAL-132-CG2 I-ARG-42-CG F-ARG-73-CB F-PRO-92-CAF-VAL-132-N I-ARG-42-CZ F-ARG-73-CG F-PRO-92-CB F-VAL-132-O I-ARG-42-NF-ARG-73-N F-PRO-92-CD F-GLN-133-C I-ARG-42-NE F-ARG-73-O F-PRO-92-CGF-GLN-133-CA I-ARG-42-NH1 F-ASN-74-C F-PRO-92-N F-GLN-133-CBI-ARG-42-NH2 F-ASN-74-CA F-PRO-92-O F-GLN-133-CD I-ARG-42-O F-ASN-74-CBF-GLN-94-CD F-GLN-133-CG I-ASN-43-N F-ASN-74-CG F-GLN-94-NE2 F-GLN-133-NI-ASN-43-OD1 F-ASN-74-N F-GLN-94-OE1 F-GLN-133-O I-ILE-44-CAF-ASN-74-ND2 F-GLU-114-OE1 F-GLN-133-OE1 I-ILE-44-CB F-ASN-74-OF-LEU-117-CB F-GLN-133-OXT I-ILE-44-CD1 F-LEU-75-C F-LEU-117-CD1I-ILE-32-CG2 I-ILE-44-CG1 F-LEU-75-CA F-LEU-117-CD2 I-ASN-33-CBI-ILE-44-CG2 F-LEU-75-CB F-LEU-117-CG I-ASN-33-CG I-ARG-47-CDF-LEU-75-CD1 F-THR-119-CB I-ASN-33-ND2 I-ARG-47-CZ F-LEU-75-CD2F-THR-119-CG2 I-ASN-33-OD1 I-ARG-47-NE F-LEU-75-CG F-THR-119-OG1I-GLN-36-C I-ARG-47-NH1 F-LEU-75-N F-VAL-125-CB I-GLN-36-CA I-ARG-47-NHF-LEU-75-O F-VAL-125-CG1 I-GLN-36-CB F-GLU-84-OE1 F-VAL-125-CG2I-GLN-36-CD F-ILE-86-C F-PRO-127-C I-GLN-36-NE2 F-ILE-86-CA F-PRO-127-CAI-GLN-36-OE1 F-ILE-86-CB F-PRO-127-CB I-ARG-37-C F-ILE-86-CD1F-PRO-127-CD I-ARG-37-CA F-ILE-86-CG1 F-PRO-127-CG I-ARG-37-CBF-ILE-86-CG2 F-PRO-127-N I-ARG-37-CD F-ILE-86-O F-PRO-127-O I-ARG-37-CGF-SER-87-C F-VAL-128-C I-ARG-37-CZ F-SER-87-CA F-VAL-128-CA I-ARG-37-NF-SER-87-N F-VAL-128-CB I-ARG-37-NE F-SER-87-O F-VAL-128-CG1I-ARG-37-NH1 F-MET-88-C F-VAL-128-CG2 I-ARG-37-NH2 F-MET-88-CAF-VAL-128-N I-ARG-37-O F-MET-88-N F-VAL-128-O I-VAL-38-C F-MET-88-OF-ILE-129-C I-VAL-38-CA

Creation of IOTA Database

A secondary database of over 11 million intra-molecular atomic contactdata was extracted from over 20000 non-homologous protein structureswhere resolution was ≦2 Å. Contacts were defined as any two atoms onopposing sides of a protein fold separated by a distance of 1 Å+the sumof their respective Van de Waals radii or less, and were limited toatoms on residues at least 4 residues apart on the linear peptidesequence. The first atom of the contacting pair was designated the thetaatom and the second atom, the iota atom. The database was divided into167 contact sets according to the theta atom type, there being 167non-hydrogen atom types within the 20 natural amino acid residuescomprising proteins. Within a contact set the relative coordinates ofeach iota atom position was recorded after normalisation of thetheta-iota atom pair coordinates. The latter was achieved by setting thetheta atom to x,y,z=0,0,0; the next covalently attached atom (3^(rd)atom) to the theta atom (in the direction of the peptide backbone) tox,y,z=x′,0,0 and the next again covalently attached atom (4^(th) atom)to x,y,z=x″,0,z′. A consistent convention was employed to defined 3^(rd)and 4^(th) atoms.

Each theta contact set was further sub-divided into 167 iota atom types,but for convenience these were concatenated into 26 sub-groups accordingto chemical type based on the definition of Engh and Huber (Engh andHuber (1991) “Accurate Bond and Angle Parameters for X-ray ProteinStructure Refinement”, Acta Cryst., A47, 392-400).

In this Example the following iota sub-groups were employed:

Description Atoms Carbonyl O all backbone O, Asn Oδ1, Gln Oε1Tetrahedral CH₂ all Cβ 9except Ala, Ile, Thr, Val) Arg Cδ, Cγ, Gln Cγ,Glu Cγ, Ile Cγ1, Lys Cγ, Cδ, Cε, Met Cγ Tetrahedral CH₃ Ala Cβ, Ile Cδ1,Cγ2, Leu Cδ1, Cδ2, Met Cε, Thr Cγ2, Val Cγ1, Cγ2 NH all backbone N(except Pro), Arg Nε, His Nδ1, Nε2, Trp Nε1 Hydroxyl O Ser Oγ, Thr Oγ1,Tyr Oη Carboxyl O Asp Oδ1, Oδ2, Glu Oε1, Oε2 NH₂ Asn Nδ2, Gln Nε2

Superimposition of Iota Data Over the IL17F Epitope Surface

For each of the 209 theta atoms comprising the IL17F epitope, thecorresponding theta contact set was selected from the IOTA database andfrom that, an appropriate iota sub-group was selected e.g. carbonyloxygen. The relative iota coordinates from this sub-group weretransposed relative to the reference frame of the given theta atom ofthe IL17F epitope (Table 2 illustrates example data). An iota datasetfor a given sub-group was thus accumulated over the whole IL17F epitope.In cases where the location of a given iota data point intersected withan atom of IL17F, closer than the sum of their respective Van de Waalsradii minus 0.2 Å, then these data points were excluded from thedataset. The process was repeated for all relevant iota sub-groups toproduce a series of iota datasets for the IL17F epitope.

TABLE 2 id out01:0000098 out01:0000127 out01:0000249 out01:0000280out01:0000282 out01:0000283 out01:0000346 out01:0000366 out01:0000527out01:0000542 pdbcode 16vpA 16vpA 1a0cA 1a0pA 1a0pA 1a0pA 1a0sP 1a0sP1a12A 1a12A res 2.1 2.1 2.5 2.5 2.5 2.5 2.4 2.4 1.7 1.7 rval 0.26 0.260.177 0.287 0.287 0.287 0.228 0.228 0.219 0.219 org Herpes simplexHerpes simplex Thermoanaero

Escherichia coli Escherichia coli Escherichia coli Salmonella typ

Salmonella typ

Homo sapiens Homo sapiens Tchain A A A A A A P P A A Taaind 96 235 36126 137 137 85 138 257 12 Taanum 142 281 36 145 156 156 155 208 277 32Taaname THR THR THR THR THR THR THR THR THR THR Taass C E E H E E H E CC Taaphi −135.7 −129.6 −71.7 −87.1 −105.2 −105.2 −118.5 −123.9 −114.7−127.1 Taapsi 163.3 160 157.7 −34.9 174 174 18.9 132.4 1 −10.7 Tomg−179.5 178.5 177.5 178.7 177.6 177.6 −178.3 179.3 −178.5 −171.7 Tc1 61.352.2 66.6 60.5 64.4 64.4 53.1 −54.5 49.2 59.2 Tc2 999.9 999.9 999.9999.9 999.9 999.9 999.9 999.9 999.9 999.9 Tc3 999.9 999.9 999.9 999.9999.9 999.9 999.9 999.9 999.9 999.9 Tc4 999.9 999.9 999.9 999.9 999.9999.9 999.9 999.9 999.9 999.9 TUpAA ALA LEU LYS ALA LEU LEU SER GLY GLYSER TDnAA ARG VAL MET GLY MET MET SER GLY GLU GLU Tnum 724 1867 288 10321111 1111 626 1048 1919 93 Tname C C C C C C C C C C Tbval 23.52 24.0723.92 31.8 38.02 38.02 16.02 15.07 20.11 12.34 Tasa 0 0 0 0 0 0 1.840.19 0 0.21 Tcdist 0 0 0 0 0 0 0 0 0 0 Todist 1.234 1.233 1.24 1.2151.239 1.239 1.226 1.222 1.247 1.249 Tndist 2.463 2.461 2.432 2.482 2.4552.455 2.536 2.412 2.492 2.502 Tcadist 1.525 1.532 1.527 1.525 1.53 1.531.541 1.519 1.516 1.521 Icdist 3.472 3.069 3.359 3.618 3.647 3.199 3.5653.589 3.432 3.67 Iodist 3.677 3.072 4.523 3.286 3.657 4.381 2.796 2.5123.207 3.652 Indist 4.507 4.556 4.306 4.708 4.178 4.537 5.735 3.963 4.2354.639 Itdist 3.472 3.069 3.359 3.618 3.647 3.199 3.565 3.589 3.432 3.67Inum 1644 1696 244 2089 1535 1519 550 1213 1845 372 Iname CD2 OD1 O CD1CB O NE2 N O CG1 Ichain A A A A A A P P A A Iaaind 206 213 30 258 190188 75 159 247 50 Iaanum 252 259 30 279 209 207 145 229 267 70 IaanameLEU ASP GLU TYR LEU ASP GLN GLY SER VAL Iaass H E C H C C E E C E Iaaphi−94.9 −110.9 −60.4 −69.5 −73.2 −83.2 −125.6 −109.2 −149.7 −101.1 Iaapsi−34.2 127.4 132.7 −18.6 −32.6 −14.4 142.7 141.1 38.2 −21 Iomg −173.5−178.2 −179.6 −177.8 −177.1 −179.8 177.8 178.2 −179.1 179.5 Ic1 −63.9−171.8 176.2 −79.4 167.9 66 −66.4 999.9 −164 −65.2 Ic2 167.9 57.6 172.1−57.1 69.8 −21.2 −63.5 999.9 999.9 999.9 Ic3 999.9 999.9 7 999.9 999.9999.9 −52.7 999.9 999.9 999.9 Ic4 999.9 999.9 999.9 999.9 999.9 999.9999.9 999.9 999.9 999.9 IUpAA ASP CYS GLU ILE VAL ILE GLY TYR LEU VALIDnAA PHE LEU VAL THR PHE VAL THR ARG ASN GLN Ibval 6.09 32.79 27.1837.22 8.29 43.39 11.84 20.76 9.22 8.96 Iasa 0.13 0 0 0 0 0 0.69 0.25 01.04 K1cdist 4.988 4.114 4.557 4.099 4.209 4.342 4.034 4.42 3.962 4.187K1odist 5.113 3.819 5.728 3.526 4.6 5.553 3.425 3.283 3.395 4.575K1ndist 5.733 5.402 5.198 5.704 4.874 5.334 5.781 4.513 5.257 4.346K1tdist 4.988 4.114 4.557 4.099 4.209 4.342 4.034 4.42 3.962 4.187 K1num1642 1695 243 2088 1532 1518 548 1214 1844 371 K1name CG CG C CG CA C CDCA C CB K1aanum 252 259 30 279 209 207 145 229 267 70 K1aaname LEU ASPGLU TYR LEU ASP GLN GLY SER VAL K2cdist NoValue NoValue NoValue 3.857NoValue NoValue NoValue NoValue NoValue NoValue K2odist NoValue NoValueNoValue 3.951 NoValue NoValue NoValue NoValue NoValue NoValue K2ndistNoValue NoValue NoValue 4.353 NoValue NoValue NoValue NoValue NoValueNoValue K2tdist NoValue NoValue NoValue 3.857 NoValue NoValue NoValueNoValue NoValue NoValue K2num NoValue NoValue NoValue 2091 NoValueNoValue NoValue NoValue NoValue NoValue K2name NoValue NoValue NoValueCE1 NoValue NoValue NoValue NoValue NoValue NoValue K2aanum NoValueNoValue NoValue 279 NoValue NoValue NoValue NoValue NoValue NoValueK2aaname NoValue NoValue NoValue TYR NoValue NoValue NoValue NoValueNoValue NoValue Cx 0 0 0 0 0 0 0 0 0 0 Cy 0 0 0 0 0 0 0 0 0 0 Cz 0 0 0 00 0 0 0 0 0 CsphR 0 0 0 0 0 0 0 0 0 0 CsphT 0 0 0 0 0 0 0 0 0 0 CsphP−3.142 0 0 3.142 0 0 0 0 0 0 CcylR 0 0 0 0 0 0 0 0 0 0 CcylT −3.142 0 03.142 0 0 0 0 0 0 CcylZ 0 0 0 0 0 0 0 0 0 0 Ox −0.628 −0.629 −0.648−0.659 −0.666 −0.666 −0.609 −0.619 −0.609 −0.582 Oy 0.998 1.004 0.978−0.754 1.028 1.028 −1.009 0.713 −1.084 −1.095 Oz 0.364 0.341 0.401−0.689 0.182 0.182 0.341 0.777 0.085 −0.146 OsphR 1.234 1.233 1.24 1.2161.238 1.238 1.227 1.223 1.246 1.249 OsphT 1.271 1.291 1.241 2.173 1.4231.423 1.289 0.882 1.503 1.688 OsphP 2.132 2.13 2.156 −2.289 2.146 2.146−2.114 2.286 −2.083 −2.059 OcylR 1.234 1.233 1.24 1.216 1.238 1.2381.227 1.223 1.246 1.249 OcylT 2.132 2.13 2.156 −2.289 2.146 2.146 −2.1142.286 −2.083 −2.059 OcylZ 0.364 0.341 0.401 −0.689 0.182 0.182 0.3410.777 0.085 −0.146 Nx 2.055 2.054 2.006 2.096 2.039 2.039 2.158 1.9822.11 2.128 Ny 1.357 1.356 1.375 1.328 1.368 1.368 1.332 1.374 1.3271.316 Nz 0 0 0 0 0 0 0 0 0 0 NsphR 2.463 2.461 2.432 2.481 2.455 2.4552.536 2.412 2.493 2.502 NsphT 1.571 1.571 1.571 1.571 1.571 1.571 1.5711.571 1.571 1.571 NsphP 0.584 0.583 0.601 0.565 0.591 0.591 0.553 0.6060.561 0.554 NcylR 2.463 2.461 2.432 2.481 2.455 2.455 2.536 2.412 2.4932.502 NcylT 0.584 0.583 0.601 0.565 0.591 0.591 0.553 0.606 0.561 0.554NcylZ 0 0 0 0 0 0 0 0 0 0 Cax 1.525 1.532 1.527 1.525 1.53 1.53 1.5411.519 1.516 1.521 Cay 0 0 0 0 0 0 0 0 0 0 Caz 0 0 0 0 0 0 0 0 0 0 CasphR1.525 1.532 1.527 1.525 1.53 1.53 1.541 1.519 1.516 1.521 CasphT 1.5711.571 1.571 1.571 1.571 1.571 1.571 1.571 1.571 1.571 CasphP 0 0 0 0 0 00 0 0 0 CacylR 1.525 1.532 1.527 1.525 1.53 1.53 1.541 1.519 1.516 1.521CacylT 0 0 0 0 0 0 0 0 0 0 CacylZ 0 0 0 0 0 0 0 0 0 0 Tx −12.209 3.1675.678 17.53 11.229 11.229 −34.605 −58.479 39.484 5.281 Ty −17.125−7.179 45.43 −33.077 −23.148 −23.148 −6.201 −0.92 −28.232 −11.189 Tz−3.618 −14.31 44.154 38.22 26.483 26.483 −24.486 −14.58 0.224 9.116TsphR TsphT TsphP TcylR TcylT TcylZ Ix −0.948 −0.804 1.174 −1.669 −0.2970.852 −2.689 −1.363 1.533 −0.362 Iy 0.629 −0.729 −2.203 1.536 1.127−2.85 −0.806 3.054 −2.417 −0.092 Iz −3.281 2.871 −2.247 −2.818 −3.455−1.176 2.198 1.302 −1.893 −3.651 IsphR 3.473 3.069 3.359 3.617 3.6463.199 3.565 3.589 3.432 3.67 IsphT 2.808 0.361 2.304 2.464 2.816 1.9470.907 1.2 2.155 3.04 IsphP 2.556 −2.405 −1.081 2.398 1.828 −1.28 −2.851.991 −1.006 −2.893 IcylR 3.473 3.069 3.359 3.617 3.646 3.199 3.5653.589 3.432 3.67 IcylT 2.556 −2.405 −1.081 2.398 1.828 −1.28 −2.85 1.991−1.006 −2.893 IcylZ −3.281 2.871 −2.247 −2.818 −3.455 −1.176 2.198 1.302−1.893 −3.651 K1x −1.194 −1.35 1.899 −2.979 0.085 1.65 −2.134 −1.060.644 0.446 K1y 1.1 −0.243 −2.896 1.098 −0.13 −3.76 −0.567 3.344 −3.1821.142 K1z −4.717 3.878 −2.962 −2.593 −4.206 −1.413 3.376 2.69 −2.271−4.003 K1sphR 4.989 4.113 4.557 4.099 4.209 4.342 4.034 4.421 3.9624.187 K1sphT 2.81 0.34 2.278 2.256 3.105 1.902 0.579 0.917 2.181 2.844K1sphP 2.397 −2.963 −0.99 2.788 −0.992 −1.157 −2.882 1.878 −1.371 1.198K1cylR 4.989 4.113 4.557 4.099 4.209 4.342 4.034 4.421 3.962 4.187K1cylT 2.397 −2.963 −0.99 2.788 −0.992 −1.157 −2.882 1.878 −1.371 1.198K1cylZ −4.717 3.878 −2.962 −2.593 −4.206 −1.413 3.376 2.69 −2.271 −4.003K2x NoValue NoValue NoValue −1.258 NoValue NoValue NoValue NoValueNoValue NoValue K2y NoValue NoValue NoValue 2.771 NoValue NoValueNoValue NoValue NoValue NoValue K2z NoValue NoValue NoValue −2.37NoValue NoValue NoValue NoValue NoValue NoValue K2sphR 3.857 K2sphT2.232 K2sphP 1.997 K2cylR 3.857 K2cylT 1.997 K2cylZ −2.37 inter MS MS MMMS MS MM MS MM MM MS Icount 1 6 3 4 4 4 2 4 4 2 Tcount 1 1 1 1 2 2 1 1 11

indicates data missing or illegible when filed

TABLE 2 Key Header Description Id Incremented Identifier of a contact.Every unique identifier is a unique theta-iota interaction. The id iseight digits with place- holding 0 s (ex. 1 is 00000001). pdbcode ThePDB code that was given in the input file. (ex 1mu4B for chain B of1mu4, or 1mu4 for the structure). res The resolution of the PDBstructure. rval The R-value of the PDB structure. org The organismsource for the PDB structure. Tchain The chain identifier for the thetaatom. Taaind The amino acid index for the theta amino acid (index startsat 1 for the first amino acid in the structure and increments for eachamino acid) Taanum The amino acid number for the theta atom. Taaname Theamino acid name for the theta atom. Taass The amino acid secondarystructure for the theta atom. Taaphi The amino acid phi angle for thetheta atom. Taapsi The amino acid psi angle for the theta atom. Tomg Theamino acid omega angle for the theta atom. Tchi1 The amino acid chi 1angle for the theta atom. Tchi2 The amino acid chi 2 angle for the thetaatom. Tchi3 The amino acid chi 3 angle for the theta atom. Tchi4 Theamino acid chi 4 angle for the theta atom. TUpAA The amino acid which isupstream of the theta amino acid. TDnAA The amino acid which isdownstream of the theta amino acid. Tnum The atom number for the thetaatom. Tname The atom name for the theta atom. Tbval The B value ortemperature factor for the theta atom. Tasa The Accessible Surface Areaof the theta atom. Tcdist The distance from the theta atom to thebackbone Carbon. Todist The distance from the theta atom to the backboneOxygen. Tndist The distance from the theta atom to the backboneNitrogen. Tcadist The distance from the theta atom to the backbone AlphaCarbon. Icdist The distance from the iota atom to the backbone Carbon ofthe theta amino acid. Iodist The distance from the iota atom to thebackbone Oxygen of the theta amino acid. Indist The distance from theiota atom to the backbone Nitrogen of the theta amino acid. Itdist Thedistance from the iota atom to the theta atom. hum The atom number forthe iota atom. Iname The atom name for the iota atom. Ichain The chainidentifier for the iota atom. Iaaind The amino acid index for the iotaamino acid (index starts at 1 for the first amino acid in the structureand increments for each amino acid) Iaanum The amino acid number for theiota atom. Iaaname The amino acid name for the iota atom. Iaass Theamino acid secondary structure for the iota atom. Iaaphi The amino acidphi angle for the iota atom. Iaapsi The amino acid psi angle for theiota atom. Iomg The amino acid omega angle for the iota atom. Ichi1 Theamino acid chi 1 angle for the iota atom. Ichi2 The amino acid chi 2angle for the iota atom. Ichi3 The amino acid chi 3 angle for the iotaatom. Ichi4 The amino acid chi 4 angle for the iota atom. IUpAA Theamino acid which is upstream of the iota amino acid. IDnAA The aminoacid which is downstream of the iota amino acid. Ibval The B value ortemperature factor for the iota atom. Iasa The Accessible Surface Areafor the iota atom. K1cdist The distance from the first kappa atom to thebackbone Carbon of the theta amino acid. K1odist The distance from thefirst kappa atom to the backbone Oxygen of the theta amino acid. K1ndistThe distance from the first kappa atom to the backbone Nitrogen of thetheta amino acid. K1tdist The distance from the first kappa atom to thetheta atom. K1num The atom number for the first kappa atom. K1name Theatom name for the first kappa atom. K1aanum The amino acid number forthe first kappa atom. K1aaname The amino acid name for the first kappaatom. K2cdist The distance from the second kappa atom to the backboneCarbon of the theta amino acid. K2odist The distance from the secondkappa atom to the backbone Oxygen of the theta amino acid. K2ndist Thedistance from the second kappa atom to the backbone Nitrogen of thetheta amino acid. K2tdist The distance from the second kappa atom to thetheta atom. K2num The atom number for the second kappa atom. K2name Theatom name for the second kappa atom. K2aanum The amino acid number forthe second kappa atom. K2aaname The amino acid name for the second kappaatom. Cx The Theta-superimposed x coordinate for the backbone Carbonatom of the theta AA. Cy The Theta-superimposed y coordinate for thebackbone Carbon atom of the theta AA. Cz The Theta-superimposed zcoordinate for the backbone Carbon atom of the theta AA. CsphR TheTheta-superimposed spherical polar distance for the backbone Carbon ofthe theta AA. CsphT The Theta-superimposed spherical polar latitudeangle for the backbone Carbon of the theta AA. CsphP TheTheta-superimposed spherical polar longitude angle for the backboneCarbon of the theta AA. CcylR The Theta-superimposed cylindrical polardistance for the backbone Carbon of the theta AA. CcylT TheTheta-superimposed cylindrical polar angle for the backbone Carbon ofthe theta AA. CcylZ The Theta-superimposed cylindrical polar zcoordinate for the backbone Carbon of the theta AA. Ox TheTheta-superimposed x coordinate for the backbone Oxygen atom of thetheta AA. Oy The Theta-superimposed y coordinate for the backbone Oxygenatom of the theta AA. Oz The Theta-superimposed z coordinate for thebackbone Oxygen atom of the theta AA. OsphR The Theta-superimposedspherical polar distance for the backbone Oxygen of the theta AA. OsphTThe Theta-superimposed spherical polar latitude angle for the backboneOxygen of the theta AA. OsphP The Theta-superimposed spherical polarlongitude angle for the backbone Oxygen of the theta AA. OcylR TheTheta-superimposed cylindrical polar distance for the backbone Oxygen ofthe theta AA. OcylT The Theta-superimposed cylindrical polar angle forthe backbone Oxygen of the theta AA. OcylZ The Theta-superimposedcylindrical polar z coordinate for the backbone Oxygen of the theta AA.Nx The Theta-superimposed x coordinate for the backbone Nitrogen atom ofthe theta AA. Ny The Theta-superimposed y coordinate for the backboneNitrogen atom of the theta AA. Nz The Theta-superimposed z coordinatefor the backbone Nitrogen atom of the theta AA. NsphR TheTheta-superimposed spherical polar distance for the backbone Nitrogen ofthe theta AA. NsphT The Theta-superimposed spherical polar latitudeangle for the backbone Nitrogen of the theta AA. NsphP TheTheta-superimposed spherical polar longitude angle for the backboneNitrogen of the theta AA. NcylR The Theta-superimposed cylindrical polardistance for the backbone Nitrogen of the theta AA. NcylT TheTheta-superimposed cylindrical polar angle for the backbone Nitrogen ofthe theta AA. NcylZ The Theta-superimposed cylindrical polar zcoordinate for the backbone Nitrogen of the theta AA. Cax TheTheta-superimposed x coordinate for the Alpha Carbon atom of the thetaAA. Cay The Theta-superimposed y coordinate for the Alpha Carbon atom ofthe theta AA. Caz The Theta-superimposed z coordinate for the AlphaCarbon atom of the theta AA. CasphR The Theta-superimposed sphericalpolar distance for the Alpha Carbon of the theta AA. CasphT TheTheta-superimposed spherical polar latitude angle for the Alpha Carbonof the theta AA. CasphP The Theta-superimposed spherical polar longitudeangle for the Alpha Carbon of the theta AA. CacylR TheTheta-superimposed cylindrical polar distance for the Alpha Carbon ofthe theta AA. CacylT The Theta-superimposed cylindrical polar angle forthe Alpha Carbon of the theta AA. CacylZ The Theta-superimposedcylindrical polar z coordinate for the Alpha Carbon of the theta AA. TxThe Theta-superimposed x coordinate for the theta atom. Ty TheTheta-superimposed y coordinate for the theta atom. Tz TheTheta-superimposed z coordinate for the theta atom. TsphR TheTheta-superimposed spherical polar distance for the Theta atom. TsphTThe Theta-superimposed spherical polar latitude angle for the Thetaatom. TsphP The Theta-superimposed spherical polar longitude angle forthe Theta atom. TcylR The Theta-superimposed cylindrical polar distancefor the Theta atom. TcylT The Theta-superimposed cylindrical polar anglefor the Theta atom. TcylZ The Theta-superimposed cylindrical polar zcoordinate for the Theta atom. Ix The Theta-superimposed x coordinatefor the iota atom. Iy The Theta-superimposed y coordinate for the iotaatom. Iz The Theta-superimposed z coordinate for the iota atom. IsphRThe Theta-superimposed spherical polar distance for the iota atom. IsphTThe Theta-superimposed spherical polar latitude angle for the iota atom.IsphP The Theta-superimposed spherical polar longitude angle for theiota atom. IcylR The Theta-superimposed cylindrical polar distance forthe iota atom. IcylT The Theta-superimposed cylindrical polar angle forthe iota atom. IcylZ The Theta-superimposed cylindrical polar zcoordinate for the iota atom. K1x The Theta-superimposed x coordinatefor the first kappa atom. K1y The Theta-superimposed y coordinate forthe first kappa atom. K1z The Theta-superimposed z coordinate for thefirst kappa atom. K1sphR The Theta-superimposed spherical polar distancefor the first kappa atom. K1sphT The Theta-superimposed spherical polarlatitude angle for the first kappa atom. K1sphP The Theta-superimposedspherical polar longitude angle for the first kappa atom. K1cylR TheTheta-superimposed cylindrical polar distance for the first kappa atom.K1cylT The Theta-superimposed cylindrical polar angle for the firstkappa atom. K1cylZ The Theta-superimposed cylindrical polar z coordinatefor the first kappa atom. K2x The Theta-superimposed x coordinate forthe second kappa atom. K2y The Theta-superimposed y coordinate for thesecond kappa atom. K2z The Theta-superimposed z coordinate for thesecond kappa atom. K2sphR The Theta-superimposed spherical polardistance for the second kappa atom. K2sphT The Theta-superimposedspherical polar latitude angle for the second kappa atom. K2sphP TheTheta-superimposed spherical polar longitude angle for the second kappaatom. K2cylR The Theta-superimposed cylindrical polar distance for thesecond kappa atom. K2cylT The Theta-superimposed cylindrical polar anglefor the second kappa atom. K2cylZ The Theta-superimposed cylindricalpolar z coordinate for the second kappa atom. inter The type ofinteraction for the theta & iota contact: MM - Main chain to Main chain,MS - Main chain to Side chain, SM . . . , SS . . . where the firstletter refers to theta and the second refers to iota. Icount The iotaatom count for the particular theta atom. Tcount The theta atom countfor the particular amino acid.

Visualisation of IL17F Iota Datasets

Each iota dataset was visualized in relation to the IL17F/Fab 496structure using molecular graphics computer software such as Pymol. Thiscould be done by direct plotting of the iota dataset as individualpoints or by first mathematically transforming the dataset into adensity function and a file format compatible for molecular graphicdisplay e.g. ccp4, so that contour maps of higher density could bedisplayed over the IL17 epitope.

Inspection of Iota Density Maps for Intersection with Fab 496 ParatopeAtoms

The IL17F/Fab 496 interface was examined to determine the degree ofintersection between individual Fab 496 atoms per residue and thecorresponding iota density maps. Residues were identified where therewas no or little intersection. In these cases alternative residues weresubstituted via the molecular graphics software to determine whetherbetter intersection could be achieved between residue atoms and relevantiota density maps. Amino acid substitutions producing good iota densitymap intersection were short listed for in vitro production and testingas single point mutations as intact IgG versions of Fab 496.

DNA Manipulations and General Methods

E. coli strain INVαF (Invitrogen) was used for transformation androutine culture growth. DNA restriction and modification enzymes wereobtained from Roche Diagnostics Ltd. and New England Biolabs. Plasmidpreparations were performed using Maxi Plasmid purification kits(Qiagen, catalogue No. 12165). DNA sequencing reactions were performedusing ABI Prism Big Dye terminator sequencing kit (catalogue No.4304149) and run on an ABI 3100 automated sequencer (AppliedBiosystems). Data was analysed using the program Auto Assembler (AppliedBiosystems). Oligonucleotides were obtained from Invitrogen. Theconcentration of IgG was determined by IgG assembly ELISA.

Affinity Maturation of Antibody CA028_00496

CA028_0496 is a humanised neutralising antibody which binds both IL17Aand IL17F isoforms. It comprises the grafted variable regions, termedgL7 and gH9, whose sequences are disclosed in WO 2008/047134. The wildtype Fab′ fragment of this antibody (Fab 496) and mutant variants wereprepared as follows: oligonucleotide primer sequences were designed andconstructed in order to introduce single point mutations in the lightchain variable region (gL7) as per residues and positions determined inthe above short list. Each mutated light chain was separately sub-clonedinto the UCB Celltech human light chain expression vector pKH10.1, whichcontained DNA encoding the human C-kappa constant region (Km3 allotype).The unaltered heavy chain variable region (gH9) sequence was sub-clonedinto the UCB Celltech expression vector pVhg1Fab6His which contained DNAencoding human heavy chain gamma-1 constant region, CH1. Heavy and lightchain encoding plasmids were co-transfected into HEK293 cells using the293Fectin™ procedure according to the manufacturer's instructions(InVitrogen. Catalogue No. 12347-019). IgG1 antibody levels secretedinto the culture supernatants after 10 to 12 days culture were assessedby ELISA and binding kinetics assessed by surface plasmon resonance (seebelow). Mutants showing improved or similar binding to IL17F were thenprepared and tested in combination as double, triple, quadruple orquintuple light chain mutations as above.

Surface Plasmon Resonance (SPR)

All SPR experiments were carried out on a Biacore 3000 system (BiacoreAB) at 25′C using HBS-EP running buffer (10 mM HEPES pH 7.4, 150 mMNaCl3 mM EDTA 0.005% (v/v) surfactant P20, Biacore AB). Goat F(ab′)₂anti-IgG Fab′ specific antibody (Jackson Labs. Product code 109-006-097)was covalently attached to the surface of a CM5 sensor chip (GEHealthcare) by the amine coupling method, as recommended by themanufacturers. Briefly, the carboxymethyl dextran surface was activatedwith a fresh mixture of 50 mM N-hydroxysuccimide and 200 mM1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide for 5 minutes at a flowrate of 10 μl/min. Anti-Fab antibody at 50 μg/ml in 10 mM sodium acetatepH 5.0 buffer was injected for 60 sec at the same flow rate. Finally thesurface was deactivated with a 10 minute pulse of 1 M ethanolamine.HClpH8.5, leaving of 4000 to 5000 response units (RU) of immobilizedantibody on the chip. A reference flow cell was prepared on the samechip by omitting the protein from the above procedure.

Wild type and mutated 496 Fabs were harvested from culture supernatantsin the range 3 to 30 μg/ml and crude supernatants were diluted inrunning buffer into the range 0.5 to 2 μg/ml. In order to evaluatebinding kinetics to IL17F, each antibody was first captured on theanti-Fab′ surface by injection at 10 μl/min for 60 sec to yield anadditional 150 to 250 RU signal. Recombinant human IL17F was titratedfrom 10 nM in running buffer and injected at 30 μl/ml, to produce anassociation phase over 180 sec followed by a dissociation phase of 300sec. At the end of each cycle the surface was regenerated with a 60 secpulse of 40 mM HCl followed by a 30 sec pulse of 5 mM NaOH at 10 μl/min.For each Fab′ a control cycle was carried out where the IL17F injectionwas replaced with an injection of running buffer.

Sensograms were corrected by subtraction of reference flow cell signal,then by subtracting the control cycle sensogram for the respective Fab′.Dissociation rate constants (k_(d)) and association rate constants(k_(a)) were fitted to the data using Biaevaluation software (BiacoreAB). Fab affinities (K_(D)) were calculated as K_(D)=k_(d)/k_(a).

Results

Intersection of Iota Density Maps with Fab 496 Atoms

Inspection of that part of the surface of Fab 496 forming the interfacewith IL17F revealed that in many areas there was complete intersectionbetween a given Fab 496 atom and the corresponding iota density map.This was particularly the case for amino acid residues comprising theFab 496 heavy chain. However there were a number of regions comprisingthe light chain where there was no intersection or little intersectionof Fab 496 atoms and corresponding iota density maps (Panel A, FIGS. 8to 13). By substituting alternative residues into the Fab 496 structureat these positions it was possible to achieve intersection between oneor more atoms of the substituted residue and the corresponding iotadensity maps. (Panel B, FIGS. 8 to 13). Thus on Fab 496 light chain thethreonine 30 Cγ2 atom does not intersect with the methyl carbon iotadensity map and the Oγ1 atom does not intersect with the hydroxyl oxygeniota density map. However when threonine 30 is mutated to arginine,there is intersection between the NE atom and the secondary amide iotadensity map (FIG. 8).

FIG. 9 shows tetrahedral methylene iota density and secondary amide iotadensity dispersed around light chain arginine 54 without anyintersection with respective side chain atoms of this residue.Conversely when mutated to a serine at this position there is goodintersection of hydroxyl oxygen iota density and the gamma oxygen atomof the serine side chain. The light chain serine 56 gamma oxygen atomdoes not intersect with hydroxyl oxygen iota density. However when thisresidue is mutated to isoleucine there is intersection of the delta 1methyl atom of the isoleucine side chain and methyl iota density (FIG.10). In the case of serine 60, again the gamma oxygen is too distantfrom adjacent hydroxyl oxygen iota density to achieve intersection,(FIG. 11), but mutating this residue to aspartate brings aboutintersection of the side chain delta oxygen atom with carboxyl oxygeniota density.

Light chain threonine 72 is not a CDR residue but part of framework 3,its side chain methyl atom does not intersect with methyl carbon iotadensity nor its side chain oxygen atom with hydroxyl oxygen iotadensity. But the arginine 72 mutation allows intersection of both sidechain delta carbon atom with methylene carbon iota density and of sidechain eta nitrogen atoms with guanidinium nitrogen iota density.

Effects of Iota Designed Mutations on Antibody CA028_00496 on BindingKinetics

Five iota designed single point light chain mutations in Fab 496, T30R,R54S, S56I, S60D and T72R, showed small improvements in binding affinityranging from 1.7 to 3.6 fold. The improvement observed with the non-CDRmutation, T72R, was surprising since residues at this position do notnormally contact the antigen. For all but one of these mutations (S60D),the improvement was driven by a reduction in dissociation rate constant(Table 3). Combinations of these mutations in pairs resulted in asynergistic improvement in binding, with a 3.8 to 7.5 fold reduction indissociation rate constant; with triple and quadruple combinationsproducing further step reductions in dissociation rate constant (Table3).

The combination of all five light chain mutations produced the largestimprovement in binding affinity (Table 4) to give an affinity value of11 pM to IL17F, some 180-fold better than the original Fab 496. Animportant finding was that there was no deleterious effect on thebinding to the IL17A isoform, in fact an improvement, with affinityconstant at 2 pM compared to 14 pM for the original Fab 496. It isinteresting that combinations of the designed mutations produce asynergistic enhancement of affinity; one explanation is that they arereasonably spaced across the light chain paratope surface (FIG. 13) andtherefore avoid negative interaction effects.

TABLE 3 Effect of single residue and combined residue light chainmutations of antibody CA028_000496 on IL17F dissociation rate constantCA028_00496 light chain fold mutations k_(d) (s⁻¹) change wt 4.2E−03T30R 3.2E−03 1.3 R54S 2.1E−03 2.0 S56I 3.0E−03 1.4 S60D 4.1E−03 1.0 T72R2.5E−03 1.7 T30R/R54S 7.0E−04 5.8 T30R/S56I 1.1E−03 3.8 R54S/S56I6.2E−04 6.6 R54S/T72R 5.5E−04 7.5 S56I/T72R 8.0E−04 5.1 S60D/T72R1.2E−03 3.6 T30R/R54S/T72R 2.5E−04 19 T30R/S56I/T72R 4.1E−04 10T30R/S56I/S60D/T72R 3.1E−04 14 T30R/R54S/S56I/T72R 1.3E−04 35T30R/R54S/S60D/T72R 9.3E−05 44 T30R/R54S/S56I/S60D/T72R 5.4E−05 104

TABLE 4 Affinity constant of antibody CA028_000496 versus variantcomprising 5 light chain mutations k_(a) (M⁻¹s⁻¹) k_(d) (s⁻¹) K_(D) (M)K_(D) (pM) wild type 2.0E+06 4.1E−03 2.0E−09 2000 T30R/R54S/S56I/2.3E+06 2.6E−05 1.1E−11 11 S60D/T72R

Conclusion

The method of creating iota density maps over the epitope surface ofIL17F in order to predict favourable mutations in the correspondingantibody, Fab 496, has proven to be successful in that the affinityconstant has been improved 180-fold to a K_(D) of 11 pM. This methoddoes not assume that only CDR mutations can be used but demonstratesthat framework region mutations are also important for affinitymaturation.

EXAMPLE 2 Utilisation of an Automated Process of the Invention in theRegion Between an Ab's Heavylight Chains to Improve Stability MethodsIdentification of Target (Theta) Atoms of Both the Heavy and LightChains of the Fv Interface of Fab X

Using the coordinates of the crystal structure of an antibody Fabfragment “Fab X” complexed with antigen, all heavy chains atoms within 6Å of any light chain atoms were identified as epitope atoms (167 thetaatoms listed in Table 5). Similarly, all light chains atoms within 6 Åof any heavy chain atoms were identified as epitope atoms (159 thetaatoms listed in Table 5).

TABLE 5 List of atoms comprising the heavy and light chain epitopeswhere the notation (1)-(2)-(3) designates (1) the respective H and Lchains of Fab X, (2) residue number and (3) the atom type. Bindingpocket (THETA) = H Binding partner (IOTA) = L H-47-CA H-47-CB H-47-CG1H-47-CG2 H-49-CB H-49-CD H-49-CG H-49-NE2 H-49-OE1 H-53-C H-53-CA H-53-OH-54-C H-54-CA H-54-N H-54-O H-55-C H-55-CA H-55-CB H-55-CD1 H-55-CD2H-55-CG H-55-N H-55-O H-56-C H-56-CA H-56-CB H-56-N H-56-O H-57-CH-57-CA H-57-CB H-57-CD1 H-57-CD2 H-57-CE2 H-57-CE3 H-57-CG H-57-CH2H-57-CZ2 H-57-CZ3 H-57-N H-57-NE1 H-57-O H-60-CB H-60-CD1 H-60-CG1H-60-CG2 H-62-CE2 H-62-CH2 H-62-CZ2 H-62-NE1 H-68-C H-68-CA H-68-CBH-68-CD1 H-68-CD2 H-68-CE1 H-68-CE2 H-68-CG H-68-CZ H-68-N H-68-O H-69-CH-69-CA H-69-N H-69-O H-70-C H-70-CA H-70-CB H-70-N H-70-O H-71-CH-71-CA H-71-CB H-71-CG2 H-71-N H-71-OG1 H-72-N H-104-CD1 H-104-CD2H-104-CE1 H-104-CE2 H-104-CG H-104-CZ H-104-OH H-109-CB H-109-CG1H-109-CG2 H-112-CD1 H-112-CE1 H-112-CE2 H-112-CZ H-112-OH H-113-CH-113-CA H-113-CB H-113-O H-113-OG H-114-C H-114-CA H-114-CB H-114-CG2H-114-N H-114-O H-114-OG1 H-115-C H-115-CA H-115-CB H-115-N H-115-OH-116-C H-116-CA H-116-CB H-116-CD H-116-CG H-116-N H-116-O H-117-CH-117-CA H-117-CB H-117-CD1 H-117-CD2 H-117-CE1 H-117-CE2 H-117-CGH-117-CZ H-117-N H-117-O H-117-OH H-118-C H-118-CA H-118-CB H-118-CD1H-118-CD2 H-118-CE1 H-118-CE2 H-118-CG H-118-CZ H-118-N H-118-O H-119-CH-119-CA H-119-CB H-119-CG H-119-N H-119-O H-121-C H-121-CA H-121-CBH-121-CD1 H-121-CD2 H-121-CE2 H-121-CE3 H-121-CG H-121-CH2 H-121-CZ2H-121-CZ3 H-121-N H-121-NE1 H-121-O H-122-C H-122-CA H-122-N H-122-OH-123-C H-123-CA H-123-O Binding pocket (THETA) = L Binding partner(IOTA) = H L-11-OD1 L-41-O L-42-C L-42-CA L-42-CB L-42-O L-43-C L-43-CAL-43-CB L-43-CD1 L-43-CD2 L-43-CE1 L-43-CE2 L-43-CG L-43-CZ L-43-NL-43-O L-44-C L-44-CA L-44-N L-44-O L-45-C L-45-CA L-45-CB L-45-N L-45-OL-45-OG L-47-CB L-47-CD1 L-47-CD2 L-47-CE1 L-47-CE2 L-47-CG L-47-CZL-47-OH L-49-CB L-49-CD L-49-CG L-49-NE2 L-49-OE1 L-52-C L-52-O L-53-CL-53-CA L-53-N L-53-O L-54-C L-54-CA L-54-CB L-54-N L-54-O L-55-CL-55-CA L-55-CB L-55-CD L-55-CG L-55-N L-55-O L-56-C L-56-CA L-56-CBL-56-N L-56-O L-57-C L-57-CA L-57-CB L-57-CD1 L-57-CD2 L-57-CG L-57-NL-59-C L-59-O L-60-C L-60-CA L-60-CB L-60-CD1 L-60-CD2 L-60-CE1 L-60-CE2L-60-CG L-60-CZ L-60-N L-60-O L-61-C L-61-CA L-61-CB L-61-CD L-61-CGL-61-N L-61-O L-61-OE1 L-61-OE2 L-62-N L-64-NZ L-66-CG2 L-98-CB L-98-CD1L-98-CD2 L-98-CE1 L-98-CE2 L-98-CG L-98-CZ L-98-OH L-100-C L-100-CAL-100-N L-100-O L-101-C L-101-CA L-101-N L-101-O L-102-C L-102-CAL-102-N L-105-O L-106-C L-106-CA L-106-CB L-106-CD1 L-106-CG1 L-106-CG2L-106-N L-106-O L-107-C L-107-CA L-107-CB L-107-N L-107-O L-107-OGL-108-C L-108-CA L-108-CB L-108-CG L-108-N L-108-OD1 L-108-OD2 L-109-CL-109-CA L-109-CB L-109-CG2 L-109-N L-109-O L-109-OG1 L-110-CA L-110-NL-111-CA L-111-CB L-111-CD1 L-111-CD2 L-111-CE1 L-111-CE2 L-111-CGL-111-CZ L-111-N L-111-O L-112-O L-113-C L-113-CA L-113-O

Superimposition of Iota Data Over the Heavy and Light Chain InterfaceSurface

For each of the 167 theta atoms comprising the heavy chain epitope, thecorresponding theta contact set was selected from the IOTA database andfrom that, an appropriate iota sub-group was selected e.g. carbonyloxygen. The relative iota coordinates from this sub-group weretransposed relative to the reference frame of the given theta atom ofthe heavy chain epitope. An iota dataset for a given sub-group was thusaccumulated over the whole heavy chain epitope. In cases where thelocation of a given iota data point intersected with an atom of theheavy chain, closer than the sum of their respective Van de Waals radiiminus 0.2 Å, then these data points were excluded from the dataset. Theprocess was repeated for all relevant iota sub-groups to produce aseries of iota datasets for the heavy chain epitope.

For each of the 159 theta atoms comprising the light chain epitope, thecorresponding theta contact set was selected from the IOTA database andfrom that, an appropriate iota sub-group was selected e.g. carbonyloxygen. The relative iota coordinates from this sub-group weretransposed relative to the reference frame of the given theta atom ofthe heavy chain epitope. An iota dataset for a given sub-group was thusaccumulated over the whole light chain epitope. In cases where thelocation of a given iota data point intersected with an atom of thelight chain, closer than the sum of their respective Van de Waals radiiminus 0.2 Å, then these data points were excluded from the dataset. Theprocess was repeated for all relevant iota sub-groups to produce aseries of iota datasets for the light chain epitope.

Inspection of Iota Density Maps for Intersection with Heavy-Light ChainAtoms

The whole process was automatically performed with an internalcustomised Rosetta python library script tailored for mutable positionsidentification, single point mutant generation, low-energy rotamer stateenumeration, quantitative IOTA score computation, VH-VL chains bindingenergy estimation, and point mutants prioritisation.

Two scoring methods were used for mutants ranking:

1. ΔIOTAScore

Aforementioned IOTA density maps generated were used to compute thespatial intersection values between each heavy atom of residue at eachmutable position and the density critical points in the correspondingtype of maps nearby. IOTAScore is the sum of the volumetric overlapsbetween the heavy atoms of one residue with the maximum of IOTAdensities with the corresponding type definitions, which reflects thedegree of intersection between individual Fab X atoms per residue andthe corresponding iota density maps. IOTAScore is negative numerically,where lower values imply more intersection. ΔIOTAScore is the change ofIOTAScores between the mutant residue and the wildtype one; similarly,the more negative the ΔIOTAScore value the greater the implication thatthe mutant is more favoured than the wildtype one.

2. Rosetta ΔΔG score

The Rosetta energy function is a linear combination of terms that modelinteraction forces between atoms, solvation effects, and torsionenergies. More specifically, Score12, the default full atom energyfunction in Rosetta is composed of a Lennard-Jones term, an implicitsolvation term, an orientation-dependent hydrogen bond term, sidechainand backbone torsion potentials derived from the PDB, a short-rangedknowledge-based electrostatic term, and reference energies for each ofthe 20 amino acids that model the unfolded state. The binding strengthbetween two binding partners, or ΔG, can be computed by subtracting theRosetta scores of the individual partners alone with that of the complexstructure formed by the two partners. Lower ΔG implies stronger binding.ΔΔG is the change of ΔG between the mutant complex and the wildtype one;the more negative the ΔΔG value the greater the implication that themutant binding affinity is higher than the wildtype one.

FIG. 14 illustrates the workflow for in silico predicting point mutationat the VH-VL interface of the Fab X structure.

In step S101, all residues on the heavy chain with at least one heavyatom within 8 Å of any light chain heavy atoms were identified asmutable positions. Similarly, all the residues on light chain with atleast one heavy atom within 8 Å of any heavy chain heavy atoms wereidentified as mutable positions.

In step S102, for the wildtype Fab X crystal structure, the residue-wiseIOTAScores and binding energy ΔG are computed, respectively. In stepS102.1, the IOTAScore for the wildtype residue on the current mutableposition with the corresponding IOTA density maps nearby is computed,termed as (IOTAScore_(wt), Position); in step S102.2, the binding energyof wildtype Fab X VH and VL chains is computed with Rosetta score12function, termed as ΔG_(wt).

In step S103, the wildtype residue on the current mutable positionidentified in step S101 are replaced (mutated) by the other amino acidtypes. Out of the 20 natural amino acid types, proline and cysteine areexcluded from mutation. All the other 18 types (alanine, arginine,asparagine, aspartic acid, glutamic acid, glutamine, glycine, histidine,isoleucine, leucine, lysine, methionine, phenylalanine, serine,threonine, tryptophan, tyrosine, valine) except the wildtype itself aremutated on each mutable position one by one.

In step S104, for each mutated residue type at each mutable position,the top 100 lowest-energy (in terms of Rosetta scoring function) rotamerstates are generated using Rosetta. The other high energy rotamer statesare discarded.

In step S105, for each rotamer state of mutant residue generated inS104, the IOTAScore is computed in the same way of step S102, termed as(IOTAScore_(mutant), Position_(j), Type_(k), Rotamer_(i)).

In step S106, the ΔIOTAScore for the current combination of rotamerstate, mutant residue type, and mutable position is computed bysubtracting (IOTAScore_(wt), Position) with (IOTAScore_(mutant),Position_(j), Type_(k), Rotamer_(i)), which is termed as (ΔIOTAScore,Position_(j), Type_(k), Rotamer_(i)). Steps S105 and S106 were repeatedto compute all of the ΔIOTAScores for each rotamer states for thecurrent mutant residue type and mutable position.

In step S107, the optimal rotamer state of the current mutant residuetype and mutable position is determined with the lowest ΔIOTAScorevalue, as shown in step S107.1. The binding energy of the mutant withthe optimal rotamer state is computed in step S107.2 in the same way asstep S102.2, termed as (ΔG_(mutant), Position_(j), Type_(k)). In stepS107.3, the change of binding energies ΔΔG between mutant and wildtypeis calculated by subtraction of ΔG_(wt) with ΔG_(mutant). After theoptimal rotamer state is prioritised, steps S103 to S107 were repeatedfor the next mutant amino acid type at the current mutable position.

In step S108, for the current mutable position, only the candidatemutants satisfying the criteria of both ΔIOTAScore<0 and ΔΔG<0 are keptfor later ranking. The rest are discarded. Steps S102 to S108 wererepeated to go through all the mutable positions and generate allcandidate mutants satisfying the same criteria.

In step S109, all the candidate mutant structures were outputed forlater visualisation analysis. The final list of candidate mutants weresorted and ranked by the lowest ΔIOTAScores.

The running command and parameters used were as below:

For light chain mutations prediction, the command was:

“python multiRotamersFabInterfaceIOTAScan.py --pdb FabX.pdb--only_chains L --region all --useIOTA --IOTAtype 167 --output_mutant”

For heavy chain mutations prediction, the command was:

“python multiRotamersFabInterfaceIOTAScan.py --pdb FabX.pdb--only_chains H --region all --useIOTA --IOTAtype 167 --output_mutant”

Extra Rosetta relevant parameters were initialized by adding thefollowing code to the “multiRotamersFabInterfaceIOTAScan.py”:

“init(extra_options=“-ex1 -ex2 -score:weights score12 -no_his_his_pairE-constant_seed -edensity:mapreso 3.0 -correct -mute all”

DNA Manipulations and General Methods

E. coli strain INVαF (Invitrogen) was used for transformation androutine culture growth. DNA restriction and modification enzymes wereobtained from Roche Diagnostics Ltd. and New England Biolabs. Plasmidpreparations were performed using Maxi Plasmid purification kits(Qiagen, catalogue No. 12165). DNA sequencing reactions were performedusing ABI Prism Big Dye terminator sequencing kit (catalogue No.4304149) and run on an ABI 3100 automated sequencer (AppliedBiosystems). Data was analysed using the program Auto Assembler (AppliedBiosystems). Oligonucleotides were obtained from Invitrogen. Theconcentration of IgG was determined by IgG assembly ELISA.

Thermostability Improvement of Fab X Through Affinity Maturation of theHeavy-Light Chain Interface

The wild type Fab fragment of Fab X and mutant variants were prepared asfollows: oligonucleotide primer sequences were designed and constructedin order to introduce single point mutations in both the heavy and lightchain variable regions as per residues and positions determined in theabove short list. Each mutated light chain was separately sub-clonedinto the UCB Celltech human light chain expression vector pKH10.1, whichcontained DNA encoding the human C-kappa constant region (Km3 allotype).Each mutated heavy chain variable region sequence was separatelysub-cloned into the UCB Celltech expression vector pVhg1Fab6His whichcontained DNA encoding human heavy chain gamma-1 constant region, CH1.Heavy and light chain encoding plasmids were co-transfected into HEK293cells using the 293Fectin™ procedure according to the manufacturer'sinstructions (InVitrogen. Catalogue No. 12347-019). IgG1 Fab antibodylevels secreted into the culture supernatants after 10 to 12 daysculture were assessed by ELISA and binding kinetics assessed by surfaceplasmon resonance (see below).

Mutants showing improved thermostability were then prepared and testedin combination as double, or triple mutations as above.

Surface Plasmon Resonance (SPR)

All SPR experiments were carried out on a BIAcore T200 (GE Healthcare).Affinipure F(ab′)₂ Fragment goat anti-human IgG, F(ab′)₂ fragmentspecific (Jackson ImmunoResearch) was immobilised on a CM5 Sensor Chipvia amine coupling chemistry to a capture level of ≈5000 response units(RUs). HBS-EP buffer (10 mM HEPES pH 7.4, 0.15 M NaCl, 3 mM EDTA, 0.05%Surfactant P20, GE Healthcare) was used as the running buffer with aflow rate of 10 μL/min. A 10 μL injection of Fab X at 0.75 μg/mL wasused for capture by the immobilised anti-human IgG-F(ab′)₂. Antigen wastitrated over the captured Fab X at various concentrations (50 nM to6.25 nM) at a flow rate of 30 μL/min. The surface was regenerated by2×10 μL injection of 50 mM HCl, followed by a 5 μL injection of 5 mMNaOH at a flowrate of 10 μL/min. Background subtraction binding curveswere analysed using the T200evaluation software (version 1.0) followingstandard procedures. Kinetic parameters were determined from the fittingalgorithm.

Thermostability Assay

Thermofluor assay was performed to assess the thermal stabilities ofpurified molecules. Purified proteins (0.1 mg/ml) were mixed with SYPRO®Orange dye (Invitrogen), and the mixture dispensed in quadruplicate intoa 384 PCR optical well plate. Samples were analysed on a 7900HT FastReal-Time PCR System (Agilent Technologies) over a temperature rangefrom 20° C. to 99° C., with a ramp rate of 1.1° C./min. Fluorescenceintensity changes per well were plotted against temperature and theinflection points of the resulting slopes were used to generate theT_(m).

Results

Intersection of Iota Density Maps with Heavy and Light Chain Atoms ofthe Interface

The automated method using a Rosetta scan produced a table of mutationsranked by IOTA score (Table 6).

Effects of Iota Designed Mutations on Fab X on Thermostability

Six iota-designed single point mutations in Fab X, H-T71R, H-T71K,H-T71N, H-T71H, L-S107E and L-T109I, showed small improvements inthermostability ranging from 0.5° C. to 2.9° C. over wild-type (Table7). FIGS. 15, 16 and 17 provide computer generated visualisationsdepicting the effects of these single point mutations. In particular,these Figures show that whilst H-T71, L-S107 and L-T109 have no densityintersection, H-R71 intersects with the amide density, L-E107 intersectswith the carboxylate density and L4109 intersects with the methyldensity. Combinations of these mutations in pairs resulted in asynergistic improvement in thermostability; with triple combinationsproducing further step improvements in thermostability (Table 8).

The combination of H-T71R, L-S107E and L-T109I, mutations produced thelargest improvement in thermostability (Table 8) to give a Tm of 81.2°C., some 5.8° C. better than the original Fab X. This combination of thethree mutations is depicted in FIG. 18. An important finding was thatthere was no significant loss on the binding to its antigen.

TABLE 6 Proposed mutations generated by the Rosetta scan method andtheir ranking in order of IOTA score Rank (by Heavy ΔIOTA Light ΔIOTAIOTAScore) chain ΔΔG Score chain ΔΔG Score 1 T71R −0.37 −56.27 T109H−1.49 −92.37 2 V109R −0.04 −50.58 T109K −0.01 −67.33 3 V109K −0.51−40.67 T109I −1.69 −45.17 4 T71H −1.06 −38.94 I106L −2.11 −42.11 5 T71K−0.08 −36.09 T109L −1.58 −37.44 6 T71W −0.67 −25.95 I106H −3.5 −31.77 7T71Y −0.67 −25.64 S107E −0.62 −18.71 8 T71N −0.03 −24.89 A53Y −0.93−17.68 9 D119W −0.89 −23.58 A53F −1.13 −15.64 10 T71Q −0.38 −23.28 I106N−0.47 −13.57 11 V109I −0.56 −22.64 12 V109H −0.4 −21.73

TABLE 7 Thermostability of mutations compared with wild−typethermostability of 75.7° C. Heavy ΔIOTA Light ΔIOTA Rank chain ΔΔG ScoreTm ° C. chain ΔΔG Score Tm ° C. 1 T71R −0.37 −56.27 78.6 T109H −1.49−92.37 75.2 2 V109R −0.04 −50.58 75.6 T109K −0.01 −67.33 ND 3 V109K−0.51 −40.67 75.6 T109I −1.69 −45.17 79.2 4 T71H −1.06 −38.94 76.6 I106L−2.11 −42.11 74.5 5 T71K −0.08 −36.09 78.6 T109L −1.58 −37.44 77.6 6T71W −0.67 −25.95 75.6 I106H −3.5 −31.77 72.2 7 T71Y −0.67 −25.64 75.3S107E −0.62 −18.71 77.3 8 T71N −0.03 −24.89 76.2 A53Y −0.93 −17.68 68.89 D119W −0.89 −23.58 73.7 A53F −1.13 −15.64 67.2 10 T71Q −0.38 −23.2875.8 I106N −0.47 −13.57 71.8 11 V109I −0.56 −22.64 ND 12 V109H −0.4−21.73 75.1 ND = Not Determined

TABLE 8 Thermostability and affinity of combinations of mutations.Rosetta Rosetta ddG dE Tm Combination (VH/VL) (Fv) (° C.) Tm SD KD (nM)H-T71R + L-S107E + −3.72 −1.73 81.2 0.3 ND L-T109I H-T71R + −2.3 −1.1780.1 0 0.88 L-T109I H-T71R + L-S107E −2.13 −0.93 80.1 0 3.90 H-T71K +L-S107E −0.3 0.76 79.4 0.6 3.50 H-T71K + L-S107E + −4.11 −1.41 79.3 0.3ND L-T109L H-T71K + −1.96 −0.98 79.1 0.3 1.39 L-T109I H-T71R + −2.77−0.95 78.0 0.6 0.96 L-T109L H-T71K + −2.2 −0.7 77.7 0.1 1.59 L-T109LH-T71R −0.37 78.0 0.3 0.80 H-T71K −0.08 77.9 0.3 1.27 L-S107E −0.62 77.30.5 5.52 L-T109I −1.69 77.9 0.3 1.47 L-T109L −1.58 76.9 0.5 1.73 WT 75.40.2 1.35 ND = Not Determined

1. A method for designing a ligand ab initio that will bind to a binding site of a macromolecular target, or of identifying a modification to a ligand for improving the affinity of the ligand to a binding site of a macromolecular target, comprising: a) identifying a target list of atoms forming the surface of the target binding site; b) identifying each atom, hereinafter referred to as a theta atom, in the target list, as a particular theta atom type; c) extracting from a structural database of biological macromolecules, information about non-bonding, intra-molecular or inter-molecular atom to atom contacts, where the first atom in a contacting pair of atoms is of a particular theta atom type and the opposing, second atom of the pair, hereinafter referred to as an iota atom, is of a particular iota atom type, said information comprising spatial and/or contextual data about the iota atom relative to the theta atom, and said data collected for a plurality of contacts of the given theta atom type from the said database is hereinafter referred to as a theta contact set; d) for each theta atom identified in the target list in b), superimposing in or around the target binding site data relating to a given iota atom type, or a predetermined group of related iota atom types, from the corresponding theta contact set extracted in c); e) combining and/or parsing the superimposed data in such a way as to predict one or more favoured regions of the binding site where the given iota atom type, or the predetermined group of related iota atom types, has high theoretical propensity; and f) with a candidate ligand notionally docked into the binding site, comparing the type and position of one or more of the atoms of the candidate ligand with the predicted favoured regions for the respective iota atom types, to identify a modification to the candidate ligand, in terms of alternate and/or additional candidate ligand atoms, that will produce a greater intersection between the alternate and/or additional candidate ligand atoms and the respective iota atom type favoured regions, leading to an improvement in the affinity of the modified candidate ligand to the binding site compared to the unmodified candidate ligand; wherein each non-bonding intra-molecular or inter-molecular contact in the database is defined as a contact between opposing residues of a protein fold or between opposing monomer units of a macromolecular fold or between two interacting macromolecular partners and is specifically between a theta atom on one side of the fold or first interacting partner and an iota atom on the opposing side or second interacting partner; in an instance where the following condition is satisfied: s−Rw≦t, where s is the separation between the two atoms of the contact, Rw is the sum of the van de Waals radii of the two atoms of the contact, and t is a predetermined threshold distance; and wherein the theta atom type is identified uniquely in b) such that there is no intersection between the data of a theta contact set extracted in c) for a given theta atom type and the data of any other theta contact set extracted in c) for any other theta atom type, apart from data concerning contacts involving the given theta atom as the iota atom.
 2. The method according to claim 1, wherein for each non-bonding intra-molecular contact extracted from the structural database of protein members in c) the following condition is also satisfied: the theta atom and the iota atom of the contact are on different residues separated by at least four residues along the linear polypeptide or are on separate polypeptide chains.
 3. The method according to claim 1, wherein the theta atom type is identified as being one and only one of: the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins; the 82 non-hydrogen atoms present in the 4 nucleotides of the deoxyribonucleic acid polymer (DNA); the 42 non-hydrogen atoms present in the methylated DNA nucleotides, cytidine phosphate and adenosine phosphate; the 85 non-hydrogen atoms present in the 4 nucleotide phosphates of the ribonucleic acid polymer (RNA); the 89 non-hydrogen atoms present in 2-O′-methylated ribose nucleotide phosphates of RNA; the over 400 non-hydrogen atoms present in the commonest post-transcription base modified RNA.
 4. The method according to claim 1, wherein the information extracted in c) is collected in a secondary database comprising one and only one theta contact set for each of the theta atom types.
 5. The method according to claim 4, wherein each of said secondary database theta contact sets is sub-divided into a plurality of non-overlapping iota atom types or non-overlapping groups of related iota atom types.
 6. The method according to claim 1, wherein the iota atom type is identified as being one and only one of: the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins; the oxygen atom present in protein bound, structurally relevant, water molecules; the 82 non-hydrogen atoms present in the 4 nucleotides of the deoxyribonucleic acid polymer (DNA); the 42 non-hydrogen atoms present in the methylated DNA nucleotides, cytidine phosphate and adenosine phosphate; the 85 non-hydrogen atoms present in the 4 nucleotide phosphates of the ribonucleic acid polymer (RNA); the 89 non-hydrogen atoms present in 2-O′-methylated ribose nucleotide phosphates of RNA; and/or the over 400 non-hydrogen atoms present in the commonest post-transcription modified bases of RNA.
 7. The method according to claim 1, wherein said predetermined group of related iota atom types is one of a plurality of non-overlapping groups obtained by sorting the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins into groups of similar chemical type.
 8. The method according to claim 7, wherein the iota atom types are sorted into the plurality of non-overlapping groups according to one or more of the following factors: elemental nature of the atom type, hybridisation state of the atom type.
 9. The method according to claim 7, wherein the iota atom types are sorted into a plurality of non-overlapping groups comprising the following: C sp³, C sp²(aromatic), C sp²(non-aromatic), N sp³, N sp², O sp³, O sp², S.
 10. The method according to claim 1, wherein: said spatial data extracted in c) defines the position of each iota atom specified in the theta contact set by geometrical reference to the position of the theta atom and to the positions of third and fourth atoms; the third atom is covalently bonded to the theta atom; and the fourth atom is covalently bonded to the third atom.
 11. The method according to claim 10, wherein: for each iota atom specified in the theta contact set, said spatial data extracted in c) defines the position of fifth and sixth atoms by geometrical reference to the position of the theta atom and to the positions of the third and fourth atoms; the fifth atom is covalently bonded to the iota atom; and the sixth atom is covalently bonded to either the fifth atom or the iota atom.
 12. The method according to claim 11, wherein the superimposition in or around the target site of (d) comprises: parsing the theta contact set to extract spatial data for contacts comprising the given iota atom type or one or more of the predetermined group of related iota atom types; and plotting this spatial data to determine theoretical locations representing where each iota atom type, or each of the one or more of the predetermined group of related iota atom types, would be located if: i) the theta atom of the contact were located at the position of the corresponding theta atom in the target binding site; and ii) the third and fourth atoms of the contact were located at the positions of the third and fourth atoms of the corresponding theta atom in the target binding site.
 13. The method according to claim 12, wherein: the extracted spatial data is parsed against said contextual data before said plotting step.
 14. The method according to claim 12, wherein a region in which a density of theoretical locations for the given iota atom type, or for the one or more of the predetermined group of related iota atom types, is above a predetermined threshold is identified as one of the favoured regions.
 15. The method according to claim 12, wherein theoretical locations for the given iota atom type, or for one or more of the predetermined group of related iota atom types, are determined for a plurality of theta atoms on the target list and a region in which a density of the cumulative theoretical locations is above the predetermined threshold is identified as one of the favoured regions.
 16. The method according to claim 12, wherein: if the theoretical location of an individual iota atom intersects with the location of an atom of the target macromolecule closer than Rw−0.2 angstroms then the said iota atom is excluded from subsequent analysis.
 17. The method according to claim 11, wherein: the third and fourth atoms are chosen uniquely for each specified theta atom type.
 18. The method according to claim 11, wherein: the fifth and sixth atoms are chosen uniquely for each specified iota atom type.
 19. The method according to claim 11, wherein: for each favoured region, vectors are derived to describe the position of the fifth atom relative to its respective iota atom and analysis is carried out on said vectors in order to identify a favoured bond vector representing a prediction of the covalent attachment of a theoretical consensus iota atom in the said region, said identified favoured bond vector being used to refine the design of the candidate ligand or modification of the candidate ligand.
 20. The method according to claim 1, wherein: said contextual data extracted in (c), contains contextual information concerning the local environment of each contact pair in the theta contact set, including one or more of the following in any combination: secondary structure, amino acid types or other monomer types comprising the contact pair, adjacent monomer units and/or local geometry thereof in a polymer chain either side of the contact, adjacent amino acids in a polypeptide chain on either side of the contact, local geometry of the said adjacent monomer units or amino acids, temperature factor of the theta atom, temperature factor of the iota atom, accessible surface area of the theta atom, accessible surface area of the iota atom, the number of different iota atom contacts for the particular theta atom and the number of other theta atoms on the same monomer unit as the theta atom.
 21. The method according to claim 1, wherein (f) comprises: identifying a modification of the candidate ligand that increases a degree of overlap between one or more atoms of the candidate ligand and a predicted favoured region or regions for an iota atom type or predetermined group of related iota atom types in the binding site.
 22. The method according to claim 1, wherein a plurality of modifications to the candidate ligand are identified in (f) and the method further comprises selecting a subset of the identified modifications based on one or both of the following: 1) the extent to which the intersection between the alternate and/or additional candidate ligand atoms and the respective iota atom type favoured regions is greater compared to the unmodified candidate ligand; and 2) the extent to which one or more factors contributing to the total energy of the complex formed by the binding of the modified candidate ligand to the binding site is/are reduced compared to the case where the unmodified candidate ligand is bound.
 23. The method according to claim 1, wherein t=2.5 angstroms
 24. The method according to claim 1, wherein t=0.8 angstroms
 25. The method according to claim 1, further comprising: out-putting data representing the modification identified in (f).
 26. The method according to claim 1, wherein the ligand is a protein.
 27. The method according to claim 26, wherein the ligand is an antibody.
 28. The method according to claim 26, wherein (f) comprises replacing each of one or more of the amino acid residues of the ligand that is/are in direct contact with the target binding site, or in close proximity to the target binding site, with each of one or more alternative residues chosen from the other 19 natural amino acids, each replacement being referred to as a residue replacement, wherein for each residue replacement that does not cause conflict between the replacement residue and adjacent atoms of the ligand or target, the type and position of each atom of the replacement residue is compared with the respective iota atom type favoured regions to identify whether they will produce a greater intersection than the atoms of the original residue.
 29. The method according to claim 28, further comprising: outputting a list of the residue replacements that are identified as producing a greater intersection than atoms of the original residue; for each listed residue replacement, using mutation of the candidate ligand to produce a modified ligand that incorporates the residue replacement; testing the affinity of each of the modified ligands to the target binding site in order to determine which residue replacements result in an affinity improvement that is above a predetermined threshold.
 30. The method according to claim 29, further comprising: modifying the candidate ligand to incorporate a plurality of the residue replacements that have been determined to result in an affinity improvement that is above the predetermined threshold.
 31. A computer readable medium or signal comprising computer readable instructions for causing a computer to carry out the method of claim
 1. 32. The medium or signal according to claim 31, wherein the computer is caused to carry out at least (c)-(e).
 33. The medium or signal according to claim 31, wherein the computer is caused to carry to carry out at least (f).
 34. A method of manufacturing a therapeutic ligand, comprising: designing a therapeutic ligand according to the method of claim 1; and manufacturing the therapeutic ligand thus designed.
 35. A therapeutic ligand manufactured according to the method of claim
 34. 36. The method or ligand according to claim 34, wherein the ligand is a protein.
 37. The method or ligand according to claim 36, wherein the protein is an antibody.
 38. A method of generating a database for use in a method for designing a ligand ab initio that will bind to a binding site of a macromolecular target, or of identifying a modification to a ligand for improving the affinity of the ligand to a binding site of a macromolecular target, comprising: analysing the relative positions of atoms in each of a plurality of proteins or other biological macromolecules in order to identify instances of a non-bonding intra-molecular contact between a first atom, referred to as a theta atom, and a second atom, referred to as an iota atom, of the protein or macromolecule; and generating a database that for each identified contact specifies: the type of the theta atom, the type of the iota atom, and the position of the iota atom relative to the theta atom; wherein a non-bonding intra-molecular contact is defined as an instance where the following conditions are satisfied: s−Rw≦t, where s is the separation between the theta and iota atoms, Rw is the sum of the van de Waals radii of the theta and iota atoms, and t is a predetermined threshold distance of typically 2.5 angstroms and preferably 0.8 angstroms; and wherein in the case of proteins, the theta and iota atoms are on amino acid residues separated from each other by at least four residues on a linear polypeptide or are on separate polypeptide chains.
 39. The method according to claim 38, wherein the method comprises sub-dividing the database to form groups of identified contacts in which the theta atom is one and only one of the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins and the iota atom is in one and only one of a plurality of non-overlapping groups obtained by sorting the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins into groups based on chemical similarity.
 40. A method of generating a database for use in a method for designing a ligand ab initio that will bind to a binding site of a macromolecular target, or of identifying a modification to a ligand for improving the affinity of the ligand to a binding site of a macromolecular target, comprising: analysing the relative positions of atoms in each of a plurality of proteins or other biological macromolecules in order to identify instances of a non-bonding intra-molecular contact between a first atom referred to as a theta atom, and a second atom, referred to as an iota atom, of the protein or macromolecule; and generating a database that for each identified contact specifies: the type of the theta atom, the type of the iota atom, and the position of the iota atom relative to the theta atom; wherein a non-bonding intra-molecular contact is defined as an instance where the following condition is satisfied: s−Rw≦t, where s is the separation between the theta and iota atoms, Rw is the sum of the van de Waals radii of the theta and iota atoms, and t is a predetermined threshold distance of typically 2.5 angstroms and preferably 0.8 angstroms; and wherein the method comprises sub-dividing the database to form groups of identified contacts in which the theta atom is one and only one of the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins and the iota atom is in one and only one of a plurality of non-overlapping groups obtained by sorting the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins into groups based on chemical similarity.
 41. The method according to claim 39, wherein: for each contact, the position of the iota atom is defined by geometrical reference to the position of the theta atom and to the positions of third and fourth atoms, the third atom being covalently bonded to the theta atom and the fourth atom being covalently bonded to the third atom, the method further comprising: normalizing the coordinates of the iota atom, theta atom, third atom, and fourth atom of each contact as a group to generate a normalized coordinate group; for each of one or more of the theta atom types, using the normalized coordinate groups for a plurality of contacts involving the theta atom type and a given iota atom type to generate a two-dimensional polar plot that represents a distribution of directions of the given iota atom, in terms of latitude and longitude, relative to the theta atom; repeating the above for different iota atom types; comparing the resultant two-dimensional polar plots to identify groups of iota atom types that yield similar distributions of directions and using those groups as the groups based on chemical similarity to sort the 167 non-hydrogen atoms present in the 20 natural amino acids of proteins into the plurality of non-overlapping groups.
 42. The method of generating a database according to claim 38, comprising: extracting contact information from at least 2000 proteins or other biological macromolecules, the extracted information containing information about at least two million contact atom pairs.
 43. The method of generating a database according to claim 38, comprising: extracting contact information from at least 10000 proteins or other biological macromolecules, the extracted contact information containing information about at least ten million contact atom pairs.
 44. The computer readable medium storing a database generated according to claim
 1. 45. The method according to claim 1, wherein for a given antibody-antigen complex, specific mutations to amino acid residues in or around the antibody binding site are predicted to produce higher binding affinity of the antibody to the antigen. 